General IO with Shnitsel tools#

Reading input data#

Shnitsel tools is available in the shnitsel package. Within the shnitsel.io module, we offer the handy read() function to read in a multitude of different formats. Currently, we support the following file types:

  • SHARC outputs, both ICOND and TRAJ formats which have been tested for version 2.0, 2.1 and 3.0 outputs.

  • PyrAI2md outputs, reading of NACs and SOCs is currently still limited, but the reading has been tested on version 2.4 outputs and is expected to work well on version 2.5 outputs as well.

  • NewtonX outputs. Testing has been performed up to version 2.2 We plan to support ASE database files soon.

During the call to read(), all input data will be converted into standard units as documented by the shnitsel-tools package, e.g. times are converted to fs, lengths to Bohr, forces to Hartree/Bohr and energies to Hartree. This allows for standardized and comparable processing independent of the input format, where different unit conventions are common.

To use Shnitsel tools, we import the shnitsel package:

[1]:
import shnitsel as st

The st.io.read() function handles all of the details of input of the different formats. Its only essential requirement is a path to the input that is supposed to be read. Here, we have multiple options:

  • path can point to the directory of a single trajectory, file or initial condition. Then only this one trajectory will be read.

[2]:
# This call loads a single initial condition configuration
single_icond_butene = st.io.read(path='./test_data/sharc/iconds_butene/ICOND_00000')
single_icond_butene
[2]:
Trajectory(_raw_dataset=<xarray.Dataset> Size: 2kB
Dimensions:                 (time: 1, state: 3, atom: 12, direction: 3,
                             statecomb: 3)
Coordinates: (12/18)
  * time                    (time) float64 8B 0.0
  * state                   (state) int64 24B 1 2 3
  * atom                    (atom) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
  * direction               (direction) <U1 12B 'x' 'y' 'z'
  * statecomb               (statecomb) object 24B MultiIndex
    state_names             (state) <U8 96B 'S0' 'S1' 'S2'
    ...                      ...
  * from                    (statecomb) int64 24B 1 1 2
  * to                      (statecomb) int64 24B 2 3 3
    delta_t                 float64 8B 0.0
    max_ts                  int64 8B 1
    t_max                   float64 8B 0.0
    charge                  float64 8B 0.0
Data variables:
    energy                  (time, state) float32 12B -156.1 -155.8 -155.6
    forces                  (time, state, atom, direction) float32 432B 0.010...
    atXYZ                   (time, atom, direction) float32 144B 0.0 ... 1.656
    nacs                    (time, statecomb, atom, direction) float32 432B -...
    dip_perm                (time, state, direction) float32 36B -0.1828 ... ...
    dip_trans               (time, statecomb, direction) float32 36B -2.591 ....
Attributes: (12/13)
    input_format:                 sharc
    _shnitsel_setup_for_cleanup:  True
    completed:                    True
    input_type:                   static
    input_format_version:         unknown
    num_singlets:                 3
    ...                           ...
    num_triplets:                 0
    has_forces:                   True
    misc_input_settings:          {'QM.in': {'num_atoms': 12, 'atXYZ': array(...
    trajectory_input_path:        test_data/sharc/iconds_butene/ICOND_00000
    trajectory_id:                0
    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44ad6..., _is_multi_trajectory=False)
  • Alternatively, path can point to a directory of multiple directories or files containing trajectories or initial conditions. In that case, st.io.read(path) will iterate over relevant subdirectories and load them in parallel. In the end, the individual trajectories are combined into a single object.

[3]:
# This call loads all initial conditions within the `./test_data/sharc/iconds_butene` directory, because the directory itself is not a trajectory.
iconds_butene = st.io.read(path='./test_data/sharc/iconds_butene')

# This call loads all trajectories within `./test_data/sharc/traj_butene_v2.1` and combines them into a single object
# Here, we also set the compound name
trajectories_butene_sharc = st.io.read(path='./test_data/sharc/traj_butene_v2.1/').set_compound_info('butene')

# We can similarly load a collection of newtonx trajectories
# We again explicitly set the compound name
trajectories_pyrazene_newtonx = st.io.read(path='./test_data/newtonx/test_pyrazene_v2.6').set_compound_info('pyrazene')
trajectories_pyrazene_newtonx
WARNING:root:5 times: We currently do not support reading pre-normalized/scalarized NACs terms (i.e. non-vectors) in our NewtonX input.
WARNING:root:5 times: We made a best-effort guess for the types/multiplicities of the individual states. Please provide a list of state types or a function to assign the state types to have the correct values assigned.
[3]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'pyrazene\': \'ABCMeta(_name=\\\'pyrazene\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'1\\\': ABCMeta(_name=1, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 451kB\\nDimensions:                 (time: 201, state: 7, atom: 10, direction: 3,\\n                             statecomb: 21)\\nCoordinates: (12/19)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 56B 1 2 3 4 5 6 7\\n  * atom                    (atom) int64 80B 0 1 2 3 4 5 6 7 8 9\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 168B MultiIndex\\n    state_names             (state) <U2 56B \\\'S0\\\' \\\'S1\\\' \\\'S2\\\' \\\'S3\\\' \\\'S4\\\' \\\'S5\\\' \\\'S6\\\'\\n    ...                      ...\\n  * from                    (statecomb) int64 168B 1 1 1 1 1 1 2 ... 4 4 4 5 5 6\\n  * to                      (statecomb) int64 168B 2 3 4 5 6 7 3 ... 5 6 7 6 7 7\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 201.0\\n    charge                  float64 8B 0.0\\nData variables:\\n    energy                  (time, state) float64 11kB -263.9 -263.8 ... -263.6\\n    e_kin                   (time) float64 2kB 0.06349 0.06314 ... 0.06262\\n    velocities              (time, atom, direction) float64 48kB 1.56e-06 ......\\n    forces                  (time, state, atom, direction) float64 338kB 0.0 ...\\n    atXYZ                   (time, atom, direction) float64 48kB 0.01591 ... ...\\nAttributes: (12/13)\\n    input_format:                 newtonx\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.5\\n    num_singlets:                 7\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   active_only\\n    misc_input_settings:          {\\\'control.dyn\\\': {\\\'nat\\\': 10, \\\'nstat\\\': 7, \\\'ns...\\n    trajectory_input_path:        test_data/newtonx/test_pyrazene_v2.6/TRAJ1\\n    trajectory_id:                1\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44adc..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'2\\\': ABCMeta(_name=2, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 451kB\\nDimensions:                 (time: 201, state: 7, atom: 10, direction: 3,\\n                             statecomb: 21)\\nCoordinates: (12/19)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 56B 1 2 3 4 5 6 7\\n  * atom                    (atom) int64 80B 0 1 2 3 4 5 6 7 8 9\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 168B MultiIndex\\n    state_names             (state) <U2 56B \\\'S0\\\' \\\'S1\\\' \\\'S2\\\' \\\'S3\\\' \\\'S4\\\' \\\'S5\\\' \\\'S6\\\'\\n    ...                      ...\\n  * from                    (statecomb) int64 168B 1 1 1 1 1 1 2 ... 4 4 4 5 5 6\\n  * to                      (statecomb) int64 168B 2 3 4 5 6 7 3 ... 5 6 7 6 7 7\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 201.0\\n    charge                  float64 8B 0.0\\nData variables:\\n    energy                  (time, state) float64 11kB -263.9 -263.8 ... -263.6\\n    e_kin                   (time) float64 2kB 0.04829 0.04631 ... 0.06287\\n    velocities              (time, atom, direction) float64 48kB -5.529e-05 ....\\n    forces                  (time, state, atom, direction) float64 338kB 0.0 ...\\n    atXYZ                   (time, atom, direction) float64 48kB -0.05122 ......\\nAttributes: (12/13)\\n    input_format:                 newtonx\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.5\\n    num_singlets:                 7\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   active_only\\n    misc_input_settings:          {\\\'control.dyn\\\': {\\\'nat\\\': 10, \\\'nstat\\\': 7, \\\'ns...\\n    trajectory_input_path:        test_data/newtonx/test_pyrazene_v2.6/TRAJ2\\n    trajectory_id:                2\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44ada..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'3\\\': ABCMeta(_name=3, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 451kB\\nDimensions:                 (time: 201, state: 7, atom: 10, direction: 3,\\n                             statecomb: 21)\\nCoordinates: (12/19)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 56B 1 2 3 4 5 6 7\\n  * atom                    (atom) int64 80B 0 1 2 3 4 5 6 7 8 9\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 168B MultiIndex\\n    state_names             (state) <U2 56B \\\'S0\\\' \\\'S1\\\' \\\'S2\\\' \\\'S3\\\' \\\'S4\\\' \\\'S5\\\' \\\'S6\\\'\\n    ...                      ...\\n  * from                    (statecomb) int64 168B 1 1 1 1 1 1 2 ... 4 4 4 5 5 6\\n  * to                      (statecomb) int64 168B 2 3 4 5 6 7 3 ... 5 6 7 6 7 7\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 201.0\\n    charge                  float64 8B 0.0\\nData variables:\\n    energy                  (time, state) float64 11kB -263.9 -263.8 ... -263.7\\n    e_kin                   (time) float64 2kB 0.03085 0.02938 ... 0.04899\\n    velocities              (time, atom, direction) float64 48kB -2.549e-05 ....\\n    forces                  (time, state, atom, direction) float64 338kB 0.0 ...\\n    atXYZ                   (time, atom, direction) float64 48kB 0.02058 ... ...\\nAttributes: (12/13)\\n    input_format:                 newtonx\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.5\\n    num_singlets:                 7\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   active_only\\n    misc_input_settings:          {\\\'control.dyn\\\': {\\\'nat\\\': 10, \\\'nstat\\\': 7, \\\'ns...\\n    trajectory_input_path:        test_data/newtonx/test_pyrazene_v2.6/TRAJ3\\n    trajectory_id:                3\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44adc..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'14\\\': ABCMeta(_name=14, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 451kB\\nDimensions:                 (time: 201, state: 7, atom: 10, direction: 3,\\n                             statecomb: 21)\\nCoordinates: (12/19)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 56B 1 2 3 4 5 6 7\\n  * atom                    (atom) int64 80B 0 1 2 3 4 5 6 7 8 9\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 168B MultiIndex\\n    state_names             (state) <U2 56B \\\'S0\\\' \\\'S1\\\' \\\'S2\\\' \\\'S3\\\' \\\'S4\\\' \\\'S5\\\' \\\'S6\\\'\\n    ...                      ...\\n  * from                    (statecomb) int64 168B 1 1 1 1 1 1 2 ... 4 4 4 5 5 6\\n  * to                      (statecomb) int64 168B 2 3 4 5 6 7 3 ... 5 6 7 6 7 7\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 201.0\\n    charge                  float64 8B 0.0\\nData variables:\\n    energy                  (time, state) float64 11kB -263.9 -263.8 ... -263.7\\n    e_kin                   (time) float64 2kB 0.03367 0.03294 ... 0.03612\\n    velocities              (time, atom, direction) float64 48kB 5.496e-05 .....\\n    forces                  (time, state, atom, direction) float64 338kB 0.0 ...\\n    atXYZ                   (time, atom, direction) float64 48kB -0.00611 ......\\nAttributes: (12/13)\\n    input_format:                 newtonx\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.5\\n    num_singlets:                 7\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   active_only\\n    misc_input_settings:          {\\\'control.dyn\\\': {\\\'nat\\\': 10, \\\'nstat\\\': 7, \\\'ns...\\n    trajectory_input_path:        test_data/newtonx/test_pyrazene_v2.6/TRAJ14\\n    trajectory_id:                14\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44adc..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'99\\\': ABCMeta(_name=99, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 451kB\\nDimensions:                 (time: 201, state: 7, atom: 10, direction: 3,\\n                             statecomb: 21)\\nCoordinates: (12/19)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 56B 1 2 3 4 5 6 7\\n  * atom                    (atom) int64 80B 0 1 2 3 4 5 6 7 8 9\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 168B MultiIndex\\n    state_names             (state) <U2 56B \\\'S0\\\' \\\'S1\\\' \\\'S2\\\' \\\'S3\\\' \\\'S4\\\' \\\'S5\\\' \\\'S6\\\'\\n    ...                      ...\\n  * from                    (statecomb) int64 168B 1 1 1 1 1 1 2 ... 4 4 4 5 5 6\\n  * to                      (statecomb) int64 168B 2 3 4 5 6 7 3 ... 5 6 7 6 7 7\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 201.0\\n    charge                  float64 8B 0.0\\nData variables:\\n    energy                  (time, state) float64 11kB -264.0 -263.8 ... -263.7\\n    e_kin                   (time) float64 2kB 0.03544 0.03565 ... 0.03024\\n    velocities              (time, atom, direction) float64 48kB -0.0001967 ....\\n    forces                  (time, state, atom, direction) float64 338kB 0.0 ...\\n    atXYZ                   (time, atom, direction) float64 48kB -0.1052 ... ...\\nAttributes: (12/13)\\n    input_format:                 newtonx\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.5\\n    num_singlets:                 7\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   active_only\\n    misc_input_settings:          {\\\'control.dyn\\\': {\\\'nat\\\': 10, \\\'nstat\\\': 7, \\\'ns...\\n    trajectory_input_path:        test_data/newtonx/test_pyrazene_v2.6/TRAJ99\\n    trajectory_id:                99\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44ada..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'pyrazene\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]

When loading various data structures, from different formats, Shnitsel tools performs some sophisticated type-autodetection to apply the correct import logic. However, if the detection fails, because it detects multiple fitting formats, the read() function offers the parameter kind, which can be set to sharc,newtonx, or pyrai2md to make it pick the specified input format without performing type detection:

[4]:
# Here we tell shnitsel tools explicitly to load a sharc-type input
import logging

trajectories_butene_sharc = st.io.read(
    path='./test_data/sharc/traj_butene_v2.1', kind='sharc'
)
[5]:
try:
    # If we specify the wrong format, it will result in an error and the call will yield no result:
    wrong_type = st.io.read(
        path='./test_data/newtonx/test_pyrazene_v2.6', kind='pyrai2md'
    )
except FileNotFoundError:
    logging.error("Could not load data because the format was specified incorrectly.")
ERROR:root:Did not detect any matching subdirectories or files for input format pyrai2md in `test_data/newtonx/test_pyrazene_v2.6`.
ERROR:root:Could not load trajectory data from `test_data/newtonx/test_pyrazene_v2.6`.
ERROR:root:Could not load data because the format was specified incorrectly.

If our compound is charged, we can set the charge on the loaded data explicitly with .set_charge(), e.g. here, we set the charge of the loaded I01 molecule to +1e. Generally, the read() function will make a best-effort attempt to guess the correct charge for your import. Furthermore, we can tell the read function, what kind of data we would expect from our read call, using the expect_dtype data type indication parameter.

[6]:
from shnitsel.data.dataset_containers.trajectory import Trajectory
from shnitsel.data.tree.tree import ShnitselDB


# The `read()` call will automatically attempt to guess the charge of our molecule
traj_I01:ShnitselDB[Trajectory] = st.io.read(
    path='./test_data/sharc/traj_I01_v2.0', kind='sharc',
    expect_dtype=Trajectory
).set_compound_info('I01')

# If the guess of the charge was not successful or does not meet our expectations,
# you can set the charge with `set_charge()` explicitly.
traj_I01 = traj_I01.set_charge(1.)
traj_I01
WARNING:root:The path `test_data/sharc/traj_I01_v2.0/TRAJ_00009` does contain a `DONT_ANALYZE` file and will therefore be skipped. Please remove that file if you want the directory to be read.
[6]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'I01\': \'ABCMeta(_name=\\\'I01\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'2\\\': ABCMeta(_name=2, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 141kB\\nDimensions:                 (time: 201, state: 3, atom: 6, direction: 3,\\n                             statecomb: 3, full_statecomb: 6)\\nCoordinates: (12/23)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 24B 1 2 3\\n  * atom                    (atom) int64 48B 0 1 2 3 4 5\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    ...                      ...\\n  * full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n  * full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 100.0\\n    charge                  float64 8B 1.0\\nData variables:\\n    energy                  (time, state) float32 2kB -94.69 -94.38 ... -94.34\\n    forces                  (time, state, atom, direction) float32 43kB 0.091...\\n    atXYZ                   (time, atom, direction) float32 14kB 0.02999 ... ...\\n    nacs                    (time, statecomb, atom, direction) float32 43kB 0...\\n    dip_perm                (time, state, direction) float32 7kB -0.002852 .....\\n    dip_trans               (time, statecomb, direction) float32 7kB 0.001566...\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    num_singlets:                 3\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                2\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc49b51..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'6\\\': ABCMeta(_name=6, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 141kB\\nDimensions:                 (time: 201, state: 3, atom: 6, direction: 3,\\n                             statecomb: 3, full_statecomb: 6)\\nCoordinates: (12/23)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 24B 1 2 3\\n  * atom                    (atom) int64 48B 0 1 2 3 4 5\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    ...                      ...\\n  * full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n  * full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 100.0\\n    charge                  float64 8B 1.0\\nData variables:\\n    energy                  (time, state) float32 2kB -94.66 -94.37 ... -94.32\\n    forces                  (time, state, atom, direction) float32 43kB 0.007...\\n    atXYZ                   (time, atom, direction) float32 14kB 0.01689 ... ...\\n    nacs                    (time, statecomb, atom, direction) float32 43kB -...\\n    dip_perm                (time, state, direction) float32 7kB 0.4447 ... 2...\\n    dip_trans               (time, statecomb, direction) float32 7kB -0.05107...\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    num_singlets:                 3\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00006\\n    trajectory_id:                6\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc49b51..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'I01\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]

Advanced read() configuration#

While st.io.read() will generally try and configure the parsing routines in a way that should fit most userst, there are several settings that can be adapted:

  • sub_pattern: This is a path pattern to match files or subdirectory names in the path location to. Let us assume, we have TRAJ_0,TRAJ_1,...TRAJ_99, and ICOND_0,...ICOND_4 directories within the same folder, but we only want to load ICOND_* files. Then we can set sub_pattern='ICOND_*' and read() will only consider the matching entries in path.

  • input_units: This is an optional setting to determine the units in which different input variables are provided, e.g. atXYZ is the variable name for positional data, forces is the variable name for forces and energy is the variable name for absolute system energy. The units are provided in a {variable_name:'unit_string'} fashion, where the variable names can be taken from the declaration of the standard Shnitsel format and the units values for different unit types can be found in shnitsel.units.definitions,

  • input_state_types and input_state_names: For various formats, it is not entirely possible to derive the multiplicity/type of states and a reasonable readable state name label from the trajectory format. read() will make a best effort to apply reasonable state names and state multiplicities based on the input information, but if the resulting state types (or names) are not according to your expectations, you can set them by providing either a list of types or names to the input_state_types and input_state_names or by providing a function in either of those parameters that takes the already parsed xarray.Dataset and can set the ds.state_types or ds.state_names values based on additional information from the dataset. Note that shnitsel tools will always apply state types first before setting state names and it will use the values of the settings input_state_types and input_state_names in this order as well, if they are provided.

  • concat_method='db': When multiple trajectories are loaded in parallel, they need to be combined into a single result. The default behavior is to pack them into a ShnitselDB structure (based on xarray.DataTree) to preserve all meta-data of the individual trajectories, allowing for advanced filtering and storage according to the FAIR principles. Alternatives are

    • 'concat', which attaches the trajctories along the time dimension, which is then referred to as frame, effectively building one long, concatenated trajectory. This will lose insights into different meta-data of the concatenated trajectories.

    • 'layers', which packs all trajectories into a shared xarray.Dataset object that has a new trajid dimension (meaning: trajectory id), along which the different trajectories can be indexed. As in 'concat', we will lose individual differences in metadata on the combined object compared to the individual trajectories.

    • 'list', which simply returnes the trajectories as a list of xarray.Dataset objects.

  • input_trajectory_id_maps: This option allows for controlled setting of trajectory ids for individual datasets. If a dict is provided here, the keys are meant to be absolute paths in posix-format (specifically of type str) with the associated values being the resulting ids. Alternatively, a function can be provided that receives the pathlib.Path object of the input path of the trajectory and is supposed to yield an integer id that is distinct from all other trajectories loaded with the same call to read()

More techincal read() configuration options#

  • multiple=True: This flag determines whether read() should automatically look for entries within the path folder to load as input datasets. If set to False, only path will be attempted to be loaded as a single dataset.

  • parallel=True: A flag that by default allows parallel processing of input in multiple child processes to accelerate multi-trajectory input.

  • error_reporting='log': A setting to determine whether errors should be thrown as an exception or just logged within the read routine. If parallel reading is enabled, only log is supported.

Storing data on disk in Shnitsel/NetCDF format#

Within the shnitsel.io module, we provide the function write_shnitsel_file(dataset, savepath) to write Shnitsel-format files using the NetCDF4 format. Writing is performed using the to_netcdf() functionality provided by the xarray package for its datatypes, which can be controlled using the option complevel=9, which sets the compression level on a scale of 0 (not compressed) to 9 (maximum compression). Higher complevel values result in smaller output files but take longer time to compute. Lower compression levels are therefor preferable if writing latency is relevant to the application.

[7]:
# Import intiial conditions as a Shnitsel DB.
from shnitsel.data.dataset_containers.trajectory import Trajectory
from shnitsel.data.tree.tree import ShnitselDB
import xarray as xr


iconds_butene = st.io.read(path='./test_data/sharc/iconds_butene', concat_method='db').set_compound_info("A03")

assert iconds_butene is not None, "Loading of initial conditions failed"
assert not isinstance(iconds_butene, list), (
    "Format should be ShnitselDB but was %s " % type(iconds_butene)
)
assert isinstance(iconds_butene, ShnitselDB), (
    "Format should be ShnitselDB but was %s " % type(iconds_butene)
)

# Import a single initial condition to demonstrate that this also works with individual datasets
iconds_butene_single = st.io.read(path='./test_data/sharc/iconds_butene/ICOND_00000')
assert iconds_butene_single is not None, "Loading of initial conditions failed"
assert not isinstance(iconds_butene_single, list), (
    "Format should be Trajectory/xr.Dataset but was %s " % type(iconds_butene_single)
)
assert isinstance(iconds_butene_single, (xr.Dataset, Trajectory)), (
    "Format should be Trajectory/xr.Dataset but was %s " % type(iconds_butene_single)
)

# Write the entire set of initial conditions to `path` and apply maximum levels of compression
st.io.write_shnitsel_file(iconds_butene, savepath="./test_data/sharc/iconds_butene.nc")

# Writing the dataset with lower compression levels yields faster write times but larger output files.
# Mostly relevant for larger datasets
st.io.write_shnitsel_file(
    iconds_butene,
    savepath="./test_data/sharc/iconds_butene_single_high_compression.nc",
    complevel=9,
)
st.io.write_shnitsel_file(
    iconds_butene,
    savepath="./test_data/sharc/iconds_butene_single_mid_compression.nc",
    complevel=5,
)
st.io.write_shnitsel_file(
    iconds_butene,
    savepath="./test_data/sharc/iconds_butene_single_low_compression.nc",
    complevel=0,
)

TL;DR: Converting trajectories to Shnitsel-format#

To simplify the process of converting trajectories to Shnitsel format for publication, we provide a simple command line script. The script convert_to_shnitsel_file is installed with the shnitsel-tools package (if installed in a virtual environment, remember to activate that environment).

It can be called like this to convert a set of trajectories into a shnitsel-db format:

[8]:
# The convert_t_shnitsel_file reads the input path, applies default unit conversion
# Optionally, it sets the compound name of the loaded data (`-c` option) and an optional group name (`-g` option).
# With the mandatory est_level parameter (`-est` or `--est_level`) the user specifies the est_level used to simulate the data.
# With the mandatory basis set parameter (`-basis` or `--basis_set`) the user specifies the basis set of the QM calculations.
# These two settings will automatically be stored in the metadata of all loaded trajectories.
# Please convert trajectories with different est levels or basis sets through different calls.
%sx convert_to_shnitsel_file ./test_data/sharc/iconds_butene/ -o ./test_data/sharc/converted_iconds_butene.nc -k sharc -c butene -g iconds -est CASSCF -basis cc-pVDZ -f

# Similarly to the call to `st.io.read()`, this script also provides the option to specify a pattern to filter subdirectories (Option `-p`)
# Or to set the log level during conversion (Option `-log`)
%sx convert_to_shnitsel_file ./test_data/sharc/traj_butene_v2.1/ -o ./test_data/sharc/converted_traj_butene.nc -k sharc -c butene -g trajc -est CASSCF -basis cc-pVDZ -p TRAJ_* -log info -f
[8]:
['WARNING:root:Conversion will overwrite test_data/sharc/converted_traj_butene.nc. Will procede because of set `--force` flag.',
 '',
 '  0%|          | 0/2 [00:00<?, ?it/s]',
 '100%|██████████| 2/2 [00:00<00:00, 23.43it/s]',
 'INFO:root:Collected the following log messages:',
 'INFO:root:2 times: Expecting SOCs in SHARC',
 'INFO:root:2 times: For sharc before version 4.0, we will attempt to extract charge data from QM interface settings.',
 'INFO:root:2 times: Found charge data from the molcas interface',
 'INFO:root:2 times: State names already assigned',
 'INFO:root:2 times: RDKit returned a self-consistent molecular structure for charge=0. Assuming molecule to be uncharged.',
 'INFO:root:Trajectory at path test_data/sharc/traj_butene_v2.1/TRAJ_00002 did not complete',
 'INFO:root:Trajectory at path test_data/sharc/traj_butene_v2.1/TRAJ_00003 did not complete',
 'INFO:root:Found discrepancies in the following dimensions: []',
 'Number of compounds in trajectory: 1',
 "Present compounds: ['butene']",
 'Number of Trajectories: 2',
 'Wrote resulting trajectory collection']

Merging multiple Shnitsel-format files into a single file#

You can merge multiple converted shnitsel files with the command merge_shnitsel_files also installed with the shnitsel-tools package. Provide the input files as a list of paths, then the output path with the -o option and optionally set the log level for debugging using the -log option:

[9]:
# Merge butene iconds and trajectories into a mixed-format dataset
%sx merge_shnitsel_files ./test_data/sharc/converted_iconds_butene.nc ./test_data/sharc/converted_traj_butene.nc -o ./test_data/sharc/combined_butene.nc
[9]:
['ERROR:root:Conversion would override test_data/sharc/combined_butene.nc. For safety reasons, we will not proceed.']

Note that compound names and group names applied in the conversion (or manually if the conversion is done in code) are preserved, when merging two or more files. If two files contain the same compounds, their compound-data will be merged. If two compounds that are being merged contain a group of the same name, that group will also be merged.

Further note that no identity check is performed on trajectories while merging groups or compounds. This can lead to the same trajectory dataset existing multiple times within the same tree structure if you accidentally made it a part of multiple input files.

Reading the Shnitsel/NetCDF format#

The Shnitsel-format files are easy to import back into shnitsel-tools. This uses the same call to st.io.read() as for the other input formats. If you want to specify the input format, you can set kind='shnitsel'. All other options of read() are also supported for shnitsel-format inputs.

[10]:
# Reading a shnitsel file with automatic type detection
shnitsel_input_butene = st.io.read("./test_data/sharc/iconds_butene.nc")

# Specifying the type explicitly
shnitsel_input_butene = st.io.read(
    "./test_data/sharc/iconds_butene.nc", kind="shnitsel"
)

Shnitsel-files have a version attribute to allow for backward-compatible loading of shnitsel-files in later versions of shnitsel-tools. Please be aware that loading an old shnitsel-format file can result in warnings due to missing metadata in the earliest versions of the shnitsel-format. If there are issues with loading a shnitsel file, it may be that you are using and older version of shnitsel-tools than what has been used to write that file. In that case, please update shnitsel-tools to the latest version.

Reading and writing ASE/SPaiNN/SchNet format databases#

We support the import of ASE/SPaiNN/SchNet format input data. The import is also performed through st.io.read() again with the input format flag kind='ase' or via automatic import type detection.

[11]:
import shnitsel as st
from shnitsel.analyze.generic import keep_norming
# Reading an ase-style sql database input:
ase_input = st.io.read("./test_data/ase/spainn_ch2nh2+.db").set_charge(1)
ase_input
[11]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Frames]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'C-N-H-H-H-H\': \'ABCMeta(_name=\\\'C-N-H-H-H-H\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.frames.Frames\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.frames.Frames\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.frames.Frames\\\'>, _data=Frames(_raw_dataset=<xarray.Dataset> Size: 166kB\\nDimensions:              (frame: 99, state: 3, full_statecomb: 6, atom: 6,\\n                          direction: 3, statecomb: 3)\\nCoordinates: (12/18)\\n  * frame                (frame) int64 792B 0 1 2 3 4 5 6 ... 93 94 95 96 97 98\\n  * state                (state) int64 24B 1 2 3\\n  * full_statecomb       (full_statecomb) object 48B MultiIndex\\n  * direction            (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb            (statecomb) object 24B MultiIndex\\n    atNames              (atom) <U1 24B \\\'C\\\' \\\'N\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\'\\n    ...                   ...\\n    state_names          (state) <U8 96B \\\'S\\\' \\\'S\\\' \\\'S\\\'\\n  * from                 (statecomb) int64 24B 1 1 2\\n  * to                   (statecomb) int64 24B 2 3 3\\n  * full_statecomb_from  (full_statecomb) int64 48B 1 1 2 2 3 3\\n  * full_statecomb_to    (full_statecomb) int64 48B 2 3 1 3 1 2\\n    charge               float64 8B 1.0\\nDimensions without coordinates: atom\\nData variables:\\n    energy               (frame, state) float64 2kB -94.67 -94.37 ... -94.29\\n    socs                 (frame, full_statecomb) float64 5kB 0.0 0.0 ... 0.0\\n    forces               (frame, state, atom, direction) float64 43kB -0.1414...\\n    nacs                 (frame, statecomb, atom, direction) float64 43kB -0....\\n    smooth_nacs          (frame, statecomb, atom, direction) float64 43kB 0.0...\\n    atXYZ                (frame, atom, direction) float64 14kB 0.00093 ... -0...\\n    dip_perm             (frame, state, direction) float64 7kB 0.133 ... 1.088\\n    dip_trans            (frame, statecomb, direction) float64 7kB -0.004758 ...\\nAttributes:\\n    completed:              False\\n    input_format:           ase\\n    input_type:             unknown\\n    input_format_version:   unknown\\n    num_singlets:           3\\n    num_doublets:           0\\n    num_triplets:           0\\n    est_level:              MR-CISD(6,4)\\n    theory_basis_set:       aug-cc-pVDZ, program: COLUMBUS\\n    trajectory_input_path:  test_data/ase/spainn_ch2nh2+.db\\n    trajectory_id:          1375288398, _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=..., _level_name=\\\'DataGroup\\\', _group_info=None)}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Frames]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'C-N-H-H-H-H\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]

We support the output of shnitsel-style datasets to a spainn or schnet database in a schnetpack conforming format via the method st.io.write_ase_db():

[12]:
# Writing in spainn format
# We set the `force=True` option to overwrite an existing database.
st.io.write_ase_db(ase_input, "./test_data/ase/spainn_conversion.db", db_format="spainn", force=True)

# Writing in schnet format
# Here, we do not force the overwriting. This command may fail if you have executed it before.
# You can add `force=True` here as well if you are sure that you want to drop the existing database.
st.io.write_ase_db(ase_input, "./test_data/ase/schnet_conversion.db", db_format="schnet")
/home/tpadmin/git/shnitsel-tools-official/shnitsel/data/traj_combiner_methods.py:491: FutureWarning: In a future version of xarray the default value for compat will change from compat='equals' to compat='override'. This change will result in the following ValueError: Cannot specify both coords='different' and compat='override'. The recommendation is to set compat explicitly for this case.
  frames = xr.concat(

If we combine our import support for e.g. SHARC files with the output to ASE databases, we see, that certain structural data is not preserved through writing to and loading from ASE databases:

[13]:
from shnitsel.data.tree.data_group import GroupInfo


sharc_input = st.io.read("./test_data/sharc/traj_I01_v2.0", parallel=False)
assert isinstance(sharc_input, ShnitselDB)
sharc_input = sharc_input.set_compound_info("I01",overwrite_all=True)
sharc_input = sharc_input.add_data_group(group_info=GroupInfo("special_group", group_attributes={"fancy":True}))
sharc_input
WARNING:root:The path `test_data/sharc/traj_I01_v2.0/TRAJ_00009` does contain a `DONT_ANALYZE` file and will therefore be skipped. Please remove that file if you want the directory to be read.
[13]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'I01\': \'ABCMeta(_name=\\\'unknown\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'special_group\\\': ABCMeta(_name=\\\'special_group\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'2\\\': ABCMeta(_name=2, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 141kB\\nDimensions:                 (time: 201, state: 3, atom: 6, direction: 3,\\n                             statecomb: 3, full_statecomb: 6)\\nCoordinates: (12/23)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 24B 1 2 3\\n  * atom                    (atom) int64 48B 0 1 2 3 4 5\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    ...                      ...\\n  * full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n  * full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 100.0\\n    charge                  float64 8B 1.0\\nData variables:\\n    energy                  (time, state) float32 2kB -94.69 -94.38 ... -94.34\\n    forces                  (time, state, atom, direction) float32 43kB 0.091...\\n    atXYZ                   (time, atom, direction) float32 14kB 0.02999 ... ...\\n    nacs                    (time, statecomb, atom, direction) float32 43kB 0...\\n    dip_perm                (time, state, direction) float32 7kB -0.002852 .....\\n    dip_trans               (time, statecomb, direction) float32 7kB 0.001566...\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    num_singlets:                 3\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                2\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc44acf..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'6\\\': ABCMeta(_name=6, _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 141kB\\nDimensions:                 (time: 201, state: 3, atom: 6, direction: 3,\\n                             statecomb: 3, full_statecomb: 6)\\nCoordinates: (12/23)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * state                   (state) int64 24B 1 2 3\\n  * atom                    (atom) int64 48B 0 1 2 3 4 5\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    ...                      ...\\n  * full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n  * full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    delta_t                 float64 8B 0.5\\n    max_ts                  int64 8B 201\\n    t_max                   float64 8B 100.0\\n    charge                  float64 8B 1.0\\nData variables:\\n    energy                  (time, state) float32 2kB -94.66 -94.37 ... -94.32\\n    forces                  (time, state, atom, direction) float32 43kB 0.007...\\n    atXYZ                   (time, atom, direction) float32 14kB 0.01689 ... ...\\n    nacs                    (time, statecomb, atom, direction) float32 43kB -...\\n    dip_perm                (time, state, direction) float32 7kB 0.4447 ... 2...\\n    dip_trans               (time, statecomb, direction) float32 7kB -0.05107...\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    _shnitsel_setup_for_cleanup:  True\\n    completed:                    True\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    num_singlets:                 3\\n    ...                           ...\\n    num_triplets:                 0\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00006\\n    trajectory_id:                6\\n    __mol:                        <rdkit.Chem.rdchem.Mol object at 0x7cc499fc..., _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=..., _level_name=\\\'DataGroup\\\', _group_info=GroupInfo(group_name=\\\'special_group\\\', group_attributes={\\\'fancy\\\': True}, grouped_properties=None))}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'I01\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]

The output in spainn format loses structural information like the specific compound name and group levels of the hierarchy.

[14]:
# print(sharc_input)
st.io.write_ase_db(sharc_input,"./test_data/ase/spainn_I01.db",db_format="spainn", force=True)
sharc_input_spainn =  st.io.read("./test_data/ase/spainn_I01.db", parallel=False)
sharc_input_spainn
/home/tpadmin/git/shnitsel-tools-official/shnitsel/data/traj_combiner_methods.py:491: FutureWarning: In a future version of xarray the default value for compat will change from compat='equals' to compat='override'. This change will result in the following ValueError: Cannot specify both coords='different' and compat='override'. The recommendation is to set compat explicitly for this case.
  frames = xr.concat(
[14]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'C-N-H-H-H-H\': \'ABCMeta(_name=\\\'C-N-H-H-H-H\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 159kB\\nDimensions:                 (time: 201, statecomb: 3, atom: 6, direction: 3,\\n                             state: 3, full_statecomb: 6)\\nCoordinates: (12/24)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * state                   (state) int64 24B 1 2 3\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    atNames                 (atom) <U1 24B \\\'C\\\' \\\'N\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\'\\n    ...                      ...\\n    from                    (statecomb) int64 24B 1 1 2\\n    to                      (statecomb) int64 24B 2 3 3\\n    full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n    full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    frame                   (time) int64 2kB 0 1 2 3 4 5 ... 196 197 198 199 200\\n    trajectory              int64 8B 2\\nDimensions without coordinates: atom\\nData variables:\\n    nacs                    (time, statecomb, atom, direction) float32 43kB 0...\\n    forces                  (time, state, atom, direction) float32 43kB 0.091...\\n    energy                  (time, state) float32 2kB -94.69 -94.38 ... -94.34\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\n    atXYZ                   (time, atom, direction) float64 29kB 0.02999 ... ...\\n    dip_perm                (time, state, direction) float32 7kB -0.00725 ......\\n    dip_trans               (time, statecomb, direction) float32 7kB 0.00398 ...\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    completed:                    True\\n    num_singlets:                 3\\n    num_doublets:                 0\\n    ...                           ...\\n    _shnitsel_setup_for_cleanup:  True\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                2125878255\\n    is_multi_trajectory:          True, _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'1\\\': ABCMeta(_name=\\\'1\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 159kB\\nDimensions:                 (time: 201, statecomb: 3, atom: 6, direction: 3,\\n                             state: 3, full_statecomb: 6)\\nCoordinates: (12/24)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * state                   (state) int64 24B 1 2 3\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    atNames                 (atom) <U1 24B \\\'C\\\' \\\'N\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\'\\n    ...                      ...\\n    from                    (statecomb) int64 24B 1 1 2\\n    to                      (statecomb) int64 24B 2 3 3\\n    full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n    full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    frame                   (time) int64 2kB 0 1 2 3 4 5 ... 196 197 198 199 200\\n    trajectory              int64 8B 2\\nDimensions without coordinates: atom\\nData variables:\\n    nacs                    (time, statecomb, atom, direction) float32 43kB -...\\n    forces                  (time, state, atom, direction) float32 43kB 0.007...\\n    energy                  (time, state) float32 2kB -94.66 -94.37 ... -94.32\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\n    atXYZ                   (time, atom, direction) float64 29kB 0.01689 ... ...\\n    dip_perm                (time, state, direction) float32 7kB 1.13 ... 6.798\\n    dip_trans               (time, statecomb, direction) float32 7kB -0.1298 ...\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    completed:                    True\\n    num_singlets:                 3\\n    num_doublets:                 0\\n    ...                           ...\\n    _shnitsel_setup_for_cleanup:  True\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                349367038\\n    is_multi_trajectory:          True, _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=..., _level_name=\\\'DataGroup\\\', _group_info=None)}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'C-N-H-H-H-H\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]

The same information is also lost in the schnet format. In both formats, the structure can be somewhat reconstructed, but compounds may need to manually named using set_compound_info() on the import result and the group structure may be lost entirely.

[15]:
st.io.write_ase_db(sharc_input,"./test_data/ase/schnet_I01.db",db_format="schnet", force=True)
sharc_input_schnet =  st.io.read("./test_data/ase/schnet_I01.db", parallel=False)
sharc_input_schnet

/home/tpadmin/git/shnitsel-tools-official/shnitsel/data/traj_combiner_methods.py:491: FutureWarning: In a future version of xarray the default value for compat will change from compat='equals' to compat='override'. This change will result in the following ValueError: Cannot specify both coords='different' and compat='override'. The recommendation is to set compat explicitly for this case.
  frames = xr.concat(
[15]:
<class 'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]'> [{'level': 'ShnitselDBRoot', 'children': '1: {\'C-N-H-H-H-H\': \'ABCMeta(_name=\\\'C-N-H-H-H-H\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=None, _children={\\\'0\\\': ABCMeta(_name=\\\'0\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 159kB\\nDimensions:                 (time: 201, statecomb: 3, atom: 6, direction: 3,\\n                             state: 3, full_statecomb: 6)\\nCoordinates: (12/24)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * state                   (state) int64 24B 1 2 3\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    atNames                 (atom) <U1 24B \\\'C\\\' \\\'N\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\'\\n    ...                      ...\\n    from                    (statecomb) int64 24B 1 1 2\\n    to                      (statecomb) int64 24B 2 3 3\\n    full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n    full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    frame                   (time) int64 2kB 0 1 2 3 4 5 ... 196 197 198 199 200\\n    trajectory              int64 8B 2\\nDimensions without coordinates: atom\\nData variables:\\n    nacs                    (time, statecomb, atom, direction) float32 43kB 0...\\n    forces                  (time, state, atom, direction) float32 43kB 0.091...\\n    energy                  (time, state) float32 2kB -94.69 -94.38 ... -94.34\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\n    atXYZ                   (time, atom, direction) float64 29kB 0.02999 ... ...\\n    dip_perm                (time, state, direction) float32 7kB -0.00725 ......\\n    dip_trans               (time, statecomb, direction) float32 7kB 0.00398 ...\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    completed:                    True\\n    num_singlets:                 3\\n    num_doublets:                 0\\n    ...                           ...\\n    _shnitsel_setup_for_cleanup:  True\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                635819650\\n    is_multi_trajectory:          True, _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\'), \\\'1\\\': ABCMeta(_name=\\\'1\\\', _dtype=<class \\\'shnitsel.data.dataset_containers.trajectory.Trajectory\\\'>, _data=Trajectory(_raw_dataset=<xarray.Dataset> Size: 159kB\\nDimensions:                 (time: 201, statecomb: 3, atom: 6, direction: 3,\\n                             state: 3, full_statecomb: 6)\\nCoordinates: (12/24)\\n  * time                    (time) float64 2kB 0.0 0.5 1.0 ... 99.0 99.5 100.0\\n  * statecomb               (statecomb) object 24B MultiIndex\\n  * direction               (direction) <U1 12B \\\'x\\\' \\\'y\\\' \\\'z\\\'\\n  * state                   (state) int64 24B 1 2 3\\n  * full_statecomb          (full_statecomb) object 48B MultiIndex\\n    atNames                 (atom) <U1 24B \\\'C\\\' \\\'N\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\' \\\'H\\\'\\n    ...                      ...\\n    from                    (statecomb) int64 24B 1 1 2\\n    to                      (statecomb) int64 24B 2 3 3\\n    full_statecomb_from     (full_statecomb) int64 48B 1 1 2 2 3 3\\n    full_statecomb_to       (full_statecomb) int64 48B 2 3 1 3 1 2\\n    frame                   (time) int64 2kB 0 1 2 3 4 5 ... 196 197 198 199 200\\n    trajectory              int64 8B 2\\nDimensions without coordinates: atom\\nData variables:\\n    nacs                    (time, statecomb, atom, direction) float32 43kB -...\\n    forces                  (time, state, atom, direction) float32 43kB 0.007...\\n    energy                  (time, state) float32 2kB -94.66 -94.37 ... -94.32\\n    socs                    (time, full_statecomb) complex128 19kB 0j 0j ... 0j\\n    atXYZ                   (time, atom, direction) float64 29kB 0.01689 ... ...\\n    dip_perm                (time, state, direction) float32 7kB 1.13 ... 6.798\\n    dip_trans               (time, statecomb, direction) float32 7kB -0.1298 ...\\nAttributes: (12/13)\\n    input_format:                 sharc\\n    input_type:                   dynamic\\n    input_format_version:         2.0\\n    completed:                    True\\n    num_singlets:                 3\\n    num_doublets:                 0\\n    ...                           ...\\n    _shnitsel_setup_for_cleanup:  True\\n    has_forces:                   True\\n    misc_input_settings:          {\\\'input\\\': {\\\'printlevel\\\': \\\'2\\\', \\\'geomfile\\\': \\\'...\\n    trajectory_input_path:        test_data/sharc/traj_I01_v2.0/TRAJ_00002\\n    trajectory_id:                795023807\\n    is_multi_trajectory:          True, _is_multi_trajectory=False), _children={}, _attrs={}, _parent=..., _level_name=\\\'DataLeaf\\\')}, _attrs={}, _parent=..., _level_name=\\\'DataGroup\\\', _group_info=None)}, _attrs={}, _parent=<class \\\'shnitsel.data.tree.tree.ShnitselDBRoot[Trajectory]\\\'> [{\\\'level\\\': \\\'ShnitselDBRoot\\\', \\\'children\\\': "1: {\\\'C-N-H-H-H-H\\\': \\\'...\\\'}"}], _level_name=\\\'CompoundGroup\\\', _group_info=None)\'}'}]