shnitsel.data.dataset_containers#
Submodules#
- shnitsel.data.dataset_containers.data_series
- shnitsel.data.dataset_containers.dataset_vis
- shnitsel.data.dataset_containers.frames
- shnitsel.data.dataset_containers.inter_state
- shnitsel.data.dataset_containers.multi_layered
- shnitsel.data.dataset_containers.multi_series
- shnitsel.data.dataset_containers.multi_stacked
- shnitsel.data.dataset_containers.per_state
- shnitsel.data.dataset_containers.shared
- shnitsel.data.dataset_containers.trajectory
- shnitsel.data.dataset_containers.trajectory_collection
- shnitsel.data.dataset_containers.xr_conversion
Classes#
A version of the multi-series dataset where the data is indexed along a new trajectory dimension. |
|
Class to serve as the basis for Layered and Stacked multi-dataseries datasets. |
|
A version of the multi-series dataset where the data is indexed along a sahred frame (Multi-index) dimension. |
|
Definition of the protocol to support instantiation from |
|
Definition of the protocol to support instantiation from |
|
Definition of the protocol to support instantiation from |
|
Definition of the protocol to support instantiation from |
|
Definition of the protocol to support instantiation from |
|
Definition of the protocol to support instantiation from |
Functions#
|
Helper function to wrap a generic xarray dataset in a wrapper container |
Package Contents#
- class MultiSeriesLayered(framesets)#
Bases:
shnitsel.data.dataset_containers.multi_series.MultiSeriesDatasetA version of the multi-series dataset where the data is indexed along a new trajectory dimension. Missing data across trajectories is padded with np.nan values and can thus lead to typing issues.
- Parameters:
framesets (xarray.Dataset | Sequence[shnitsel.data.dataset_containers.frames.Frames | shnitsel.data.dataset_containers.trajectory.Trajectory | xarray.Dataset])
- _stacked_repr_cached: MultiSeriesStacked | None = None#
- property as_stacked: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked#
Get a stacked representation of the layered datasets in this object
- Returns:
The converted (or extracted from cache) stacked version of this multi-data dataset.
- Return type:
- property as_layered: Self#
- Return type:
Self
- class MultiSeriesDataset(basis, combined=None)#
Bases:
shnitsel.data.dataset_containers.data_series.DataSeriesClass to serve as the basis for Layered and Stacked multi-dataseries datasets.
Is itself a DataSeries, but with different, more specific semantics than a generic DataSeries.
- Parameters:
basis (xarray.Dataset | Sequence[shnitsel.data.dataset_containers.frames.Frames | shnitsel.data.dataset_containers.trajectory.Trajectory | xarray.Dataset])
combined (xarray.Dataset | None)
- _basis_data: Sequence[shnitsel.data.dataset_containers.frames.Frames | shnitsel.data.dataset_containers.trajectory.Trajectory | xarray.Dataset] | None = None#
- property as_stacked: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked#
- Abstractmethod:
- Return type:
shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked
- property as_layered: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered#
- Abstractmethod:
- Return type:
shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered
- get_grouping_metadata()#
- class MultiSeriesStacked(framesets)#
Bases:
shnitsel.data.dataset_containers.frames.Frames,shnitsel.data.dataset_containers.multi_series.MultiSeriesDatasetA version of the multi-series dataset where the data is indexed along a sahred frame (Multi-index) dimension. There is no padding necessary to make the trajectories the same length.
- Parameters:
framesets (Sequence[shnitsel.data.dataset_containers.frames.Frames | shnitsel.data.dataset_containers.trajectory.Trajectory | xarray.Dataset] | xarray.Dataset)
- _layered_repr_cached: MultiSeriesLayered | None = None#
- property as_layered: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered#
Get a layered representation of the stacked datasets in this object
- Returns:
The converted (or extracted from cache) layered version of this multi-data dataset.
- Return type:
- property as_stacked: Self#
- Return type:
Self
- class DataSeries(ds)#
Bases:
shnitsel.data.dataset_containers.shared.ShnitselDatasetDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
ds (xarray.Dataset)
- property per_state: shnitsel.data.dataset_containers.per_state.PerState#
Convert this trajectory to a PerState object only allowing access to the per-state data encoded in this entity
- Returns:
PerState
- Return type:
The wrapper for the per-state properties
- property inter_state: shnitsel.data.dataset_containers.inter_state.InterState#
Convert this trajectory to an InterState object only allowing access to the inter-state data encoded in this entity.
Will calculate some interstate properties like state-to-state energy differences.
- Returns:
InterState
- Return type:
The wrapper for the inter-state properties
- property leading_dim: str#
The leading dimension along which consistent configurations are indexed. Usually time or frame.
- Return type:
- property positions#
The atom position data stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property atXYZ#
The positional data for atoms stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property energy#
The energy information stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property forces#
The force data stored in this dataset if accessible. Note that depending on forces_format, there may only be data for the active state or for some of the states.
Will throw a KeyError if no data is accessible.
- property nacs#
The non adiabatic coupling data stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property socs#
The spin orbit coupling data stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property dipole_permanent#
The permanent dipole data stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property dipole_transition#
The transition dipole data stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property e_kin#
The kinetic energy information stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- property velocities#
The velocity information stored in this dataset if accessible.
Will throw a KeyError if no data is accessible.
- _param_from_vars_or_attrs(key)#
Helper function to extract information either from a data var or from a coordinate or from the attributes of the dataset
- Parameters:
key (str) – The key under which we expect to find the data
- Returns:
the value associated with the key that has been found
- Return type:
Any|None
- property t_max: float#
Maximum time up to which the simulation could have run if not interrupted.
It may actually have run to this time.
- Return type:
- property trajid: int | str | None#
Id of the trajectory. If assigned it is expected to be unique across the same input but may clash with other trajectory ids if multiple separate imports are combined or indepdendent simulation data is combined.
- property max_ts: int#
The maximum time step to which the simulation progressed before termination.
- Return type:
- property completed: bool#
A flag whether the imported Trajectory had successfully completed.
- Return type:
- property input_format: Literal['sharc', 'newtonx', 'ase', 'pyrai2md', 'unknown'] | str#
Name of the simulation software or input file type from which the data was originally imported.
- Return type:
Literal[‘sharc’, ‘newtonx’, ‘ase’, ‘pyrai2md’, ‘unknown’] | str
- property input_type: Literal['static', 'dynamic', 'unknown']#
Whether the data in this trajectory is static (independently optimized) or continuous time-resolved data or whether the type is not known
- Return type:
Literal[‘static’, ‘dynamic’, ‘unknown’]
- property input_format_version: str#
The version of the simulation software used to create this trajectory
- Return type:
- property forces_format: bool | Literal['all', 'active_only'] | None#
The forces format in the trajectory.
Options are a binary flag to signify whether there are forces or not. If the flag is True, the forces still might not be available for all states but only for the active state. If ‘all’ is the format, then there will be forces for all states. If the mode is ‘active_only’ there will definitely only be forces for the active state in the trajectory. If The mode is None, more specific manual analysis may be required.
- Return type:
bool | Literal[‘all’, ‘active_only’] | None
- property trajectory_input_path: str | None#
Input path from which the trajectory was loaded
- Return type:
str | None
- property theory_basis_set: str | None#
The theory basis set identifier for the underlying simulation
- Return type:
str | None
- property est_level: str | None#
The electronic structure theory level used during the simulation.
- Return type:
str | None
- property misc_input_settings: dict | None#
A dictionary of miscalleneous input settings read from trajectory output
Arbitrary mapping from file names to settings within those files.
- Return type:
dict | None
- property attrs: dict#
A dictionary of the attributes set on this Trajectory.
Arbitrary mapping from attribute keys (str) to attribute values.
- Return type:
- get_grouping_metadata()#
- class ShnitselDataset(ds)#
Bases:
shnitsel.data.xr_io_compatibility.SupportsFromXrConversion,shnitsel.data.xr_io_compatibility.SupportsToXrConversionDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
ds (xarray.Dataset)
- _raw_dataset: xarray.Dataset#
- property dataset: xarray.Dataset#
- Return type:
- property state_ids#
- property state_names#
- property state_types#
- property state_magnetic_number#
- property state_degeneracy_group#
- property state_charges#
- property active_state#
- property state_diagonal#
- property atom_names#
- property atom_numbers#
- property charge: float#
The charge of the molecule if set on the trajectory data. Loaded from charge attribute (or variable) or state_charges coordinate if provided.
If no information is found, 0 is returned.
- Return type:
- set_charge(value)#
Method to set the charge on a dataset, clear conflicting positions of charge info on the dataset and return a new instance of the wrapped dataset.
- Parameters:
value (float | xr.DataArray) – Either a single value (optionally wrapped in a DataArray already) to indicate the charge of the full molecule in all states (will be set to coordinate charge) or a DataArray that represents state-dependent charges (which will be set to state_charges)
- Returns:
The updated object as a copy.
- Return type:
Self
- Raises:
ValueError – If an unsupported value was provided.
- property dims#
- property coords#
- property sizes#
- property data_vars#
- property mol: rdkit.Chem.Mol#
Helper method to get a representative molecule object for the geometry within this dataset.
- Returns:
Either a copy of a cached mol object (for partial substructures) or a newly constructed default object
- Return type:
rdkit.Chem.Mol
- sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#
Returns a new dataset with each data array indexed by tick labels along the specified dimension(s).
In contrast to .isel, indexers for this method should use labels (i.e. explicit values in that dimension) instead of integers.
Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.
It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –
Method to use for inexact matches:
None (default): only exact matches
pad / ffill: propagate last valid index value forward
backfill / bfill: propagate next valid index value backward
nearest: use nearest valid index value
tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
abs(index[indexer] - target) <= tolerance.drop (bool, optional) – If
drop=True, drop coordinates variables in indexers instead of making them scalar.**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
dataset – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.
- Return type:
Self
See also
ShnitselDataset.iselDataset.selDataset.iselDataArray.sel- xarray-tutorial:intermediate/indexing/indexing
Tutorial material on indexing with Xarray objects
- xarray-tutorial:fundamentals/02.1_indexing_Basic
Tutorial material on basics of indexing
- isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#
Returns a new dataset with each array indexed along the specified dimension(s).
This method selects values from each array using its __getitem__ method, except this method does not require knowing the order of each array’s dimensions.
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
drop (bool, default: False) – If
drop=True, drop coordinates variables indexed by integers instead of making them scalar.missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions
**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
obj – A new Dataset with the same contents as this dataset, except each array and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.
- Return type:
Dataset
Examples
# A specific element from the dataset is selected
>>> dataset.isel(atom=1, time=0) <xarray.Dataset> Size: Dimensions: (direction: 3) Coordinates: atom int16 2B 1 time float64 8B 0.0 direction (direction) <U1 3B 'x' 'y' 'z' Data variables: energy float64 8B -238.2 forces (direction) float64 24B 1.2 -0.2 0.1
# Indexing with a slice using isel
>>> slice_of_data = dataset.isel(atom=slice(0, 2), time=slice(0, 2)) >>> slice_of_data <xarray.Dataset> Size: Dimensions: (atom: 2, time: 2, direction: 3) Coordinates: * atom (atom) int16 2B 1 * time (time) float64 16B 0.0 0.5 * direction <U1 3B 'x' 'y' 'z' Data variables: energy (time) float64 24B -238.2 forces (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...
>>> index_array = xr.DataArray([0, 2], dims="atom") >>> indexed_data = dataset.isel(atom=index_array) >>> indexed_data <xarray.Dataset> Size: Dimensions: (atom: 2, time: 3, direction: 3) Coordinates: * atom (atom) int16 4B 1 3 * time (time) float64 16B 0.0 0.5 1.0 * direction <U1 3B 'x' 'y' 'z' Data variables: energy (time) float64 24B -238.2 -238.4 -237.9 forces (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...
See also
ShnitselDataset.selDataset.selDataset.iselDataArray.isel- xarray-tutorial:intermediate/indexing/indexing
Tutorial material on indexing with Xarray objects
- xarray-tutorial:fundamentals/02.1_indexing_Basic
Tutorial material on basics of indexing
- property _attr_sources: Iterable[Mapping[Hashable, Any]]#
Places to look-up items for attribute-style access
- Return type:
Iterable[Mapping[Hashable, Any]]
- property _item_sources: Iterable[Mapping[Hashable, Any]]#
Places to look-up items for key-completion
- Return type:
Iterable[Mapping[Hashable, Any]]
- __contains__(a)#
- _repr_html_()#
- Return type:
Any
- __getitem__(key)#
- __dir__()#
Provide method name lookup and completion. Only provide ‘public’ methods.
- _ipython_key_completions_()#
Provide method for the key-autocompletions in IPython. See https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion For the details.
- convert(varname=None, unit=None)#
Convert an entry in this dataset to a specific unit.
Returns a copy of the dataset with the entry updated.
- as_xr_dataset()#
Base function to implement by classes supporting this protocol to allow for standardized conversion to a dataset
- Returns:
A tuple of the io_type_tag under which the deserializer is registered with the Shnitsel Tools framework (or None if no deserialization is desired/supported)/ Then the `xr.Dataset that is the result of the conversion. And lastly a dict of metadata that might help with deserialization later on.
- Return type:
- Raises:
ValueError – If the conversion failed for some reason.
- classmethod from_xr_dataset(dataset, metadata)#
Class method to support standardized deserialization of arbitrary classes. Implemented as a class method to avoid need to construct instance for deserialization.
- Parameters:
cls (type[ResType]) – The class executing the deserialization.
dataset (xr.Dataset) – The dataset to be deserialized into the output type.
metadata (MetaData) – Metdatata from the serialization process.
- Returns:
The deserialized instance of the target class.
- Return type:
instance of cls
- Raises:
TypeError – If deserialization of the object was not possible
- class Trajectory(ds)#
Bases:
shnitsel.data.dataset_containers.data_series.DataSeriesDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
ds (xarray.Dataset)
- _is_multi_trajectory = False#
- property as_frames: shnitsel.data.dataset_containers.frames.Frames#
Convert this trajectory to a frames version of this trajectory, where the leading dimension is frame instead of time.
- Returns:
Frames
- Return type:
The resulting frames instance with a stacked dimension frame and a new coordinate active_trajectory along the frame dimension
- property as_trajectory: Self#
Convert this trajectory to a trajectory.
- Returns:
Self
- Return type:
The same object that is already a trajectory
- property is_multi_trajectory: bool#
Flag whether this is a multi-trajectory container.
Overwritten by child classes that combine multiple trajectories into one object
- Return type:
- property trajectory_input_path: str | None#
Input path from which the trajectory was loaded
- Return type:
str | None
- class Frames(ds)#
Bases:
shnitsel.data.dataset_containers.data_series.DataSeriesDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
ds (xarray.Dataset)
- property as_frames: Self#
Idempotent conversion to Frame instance
- Return type:
Self
- property as_trajectory: shnitsel.data.dataset_containers.trajectory.Trajectory#
Attempt to convert this dataset into a Trajectory instance.
Drops the atrajectory and trajectory dimensions of the Frameset and replaces the frame dimension with a time dimension before conversion.
- Returns:
The converted dataset underlying this Frameset.
- Return type:
- property leading_dim: str#
The leading dimension along which consistent configurations are indexed. Usually time or frame.
- Return type:
- property trajid: int | str | None#
Id of the trajectory. If assigned it is expected to be unique across the same input but may clash with other trajectory ids if multiple separate imports are combined or indepdendent simulation data is combined.
- property atrajectory: xarray.DataArray | None#
Ids of the active trajectory in this frameset if present
- Return type:
xarray.DataArray | None
- property active_trajectory: xarray.DataArray | None#
Ids of the active trajectory in this frameset if present
- Return type:
xarray.DataArray | None
- class InterState(frames=None, /, direct_interstate_data=None)#
Bases:
shnitsel.data.dataset_containers.shared.ShnitselDerivedDatasetDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
frames (shnitsel.data.dataset_containers.data_series.DataSeries | None)
direct_interstate_data (xarray.Dataset | None)
- _original_frames: shnitsel.data.dataset_containers.data_series.DataSeries | None#
- property delta_energy: xarray.DataArray#
- Return type:
- property energy_interstate: xarray.DataArray#
- Return type:
- property dipole_transition: xarray.DataArray#
- Return type:
- property dipole_transition_norm: xarray.DataArray#
- Return type:
- property nacs: xarray.DataArray#
- Return type:
- property nacs_norm: xarray.DataArray#
- Return type:
- property socs: xarray.DataArray#
- Return type:
- property socs_norm: xarray.DataArray#
- Return type:
- property fosc: xarray.DataArray#
- Return type:
- as_xr_dataset()#
Base function to implement by classes supporting this protocol to allow for standardized conversion to a dataset
- Returns:
A tuple of the io_type_tag under which the deserializer is registered with the Shnitsel Tools framework (or None if no deserialization is desired/supported)/ Then the `xr.Dataset that is the result of the conversion. And lastly a dict of metadata that might help with deserialization later on.
- Return type:
- Raises:
ValueError – If the conversion failed for some reason.
- classmethod from_xr_dataset(dataset, metadata)#
Class method to support standardized deserialization of arbitrary classes. Implemented as a class method to avoid need to construct instance for deserialization.
- Parameters:
cls (type[ResType]) – The class executing the deserialization.
dataset (xr.Dataset) – The dataset to be deserialized into the output type.
metadata (MetaData) – Metdatata from the serialization process.
- Returns:
The deserialized instance of the target class.
- Return type:
instance of cls
- Raises:
TypeError – If deserialization of the object was not possible
- class PerState(frames=None, /, direct_perstate_data=None)#
Bases:
shnitsel.data.dataset_containers.shared.ShnitselDerivedDatasetDefinition of the protocol to support instantiation from xarray dataset structs.
- Parameters:
frames (shnitsel.data.dataset_containers.data_series.DataSeries | None)
direct_perstate_data (xarray.Dataset | None)
- _original_frames: shnitsel.data.dataset_containers.data_series.DataSeries | None#
- property energy: xarray.DataArray#
- Return type:
- property dipole_permanent: xarray.DataArray#
- Return type:
- property dipole_permanent_norm: xarray.DataArray#
- Return type:
- property forces: xarray.DataArray#
- Return type:
- property forces_norm: xarray.DataArray#
- Return type:
- property forces_format: bool | Literal['all', 'active_only'] | None#
- Return type:
bool | Literal[‘all’, ‘active_only’] | None
- as_xr_dataset()#
Base function to implement by classes supporting this protocol to allow for standardized conversion to a dataset
- Returns:
A tuple of the io_type_tag under which the deserializer is registered with the Shnitsel Tools framework (or None if no deserialization is desired/supported)/ Then the `xr.Dataset that is the result of the conversion. And lastly a dict of metadata that might help with deserialization later on.
- Return type:
- Raises:
ValueError – If the conversion failed for some reason.
- classmethod from_xr_dataset(dataset, metadata)#
Class method to support standardized deserialization of arbitrary classes. Implemented as a class method to avoid need to construct instance for deserialization.
- Parameters:
cls (type[ResType]) – The class executing the deserialization.
dataset (xr.Dataset) – The dataset to be deserialized into the output type.
metadata (MetaData) – Metdatata from the serialization process.
- Returns:
The deserialized instance of the target class.
- Return type:
instance of cls
- Raises:
TypeError – If deserialization of the object was not possible
- wrap_dataset(ds: xarray.Dataset | trajectory.Trajectory | frames.Frames | data_series.DataSeries | shared.ShnitselDataset, expected_types: type[ConvertedType]) ConvertedType#
- wrap_dataset(ds: xarray.Dataset | trajectory.Trajectory | frames.Frames | data_series.DataSeries | shared.ShnitselDataset, expected_types: None = None) shared.ShnitselDataset | xarray.Dataset
Helper function to wrap a generic xarray dataset in a wrapper container
- Parameters:
ds (xr.Dataset) – The dataset to wrap or an already wrapped dataset that may not need conversion.
expected_types (type[ConvertedType] | UnionType, optional) – Can be used to limit which wrapped format would be acceptable as a result. If set, an assertion error will be triggered if the ds parameter could not be wrapped in the appropriate type.
- Returns:
The wrapped dataset or the original dataset if no conversion was possible
- Return type:
ConvertedType | ShnitselDataset | xr.Dataset
Notes
This function can also be called with a tree structure as input and will automatically map itself over the leaves. This is only meant for internal Shnitsel tools use and may be removed at some point.