shnitsel.data.dataset_containers.shared#

Classes#

ShnitselDataset

Definition of the protocol to support instantiation from

ShnitselDerivedDataset

Definition of the protocol to support instantiation from

Module Contents#

class ShnitselDataset(ds)#

Bases: shnitsel.data.xr_io_compatibility.SupportsFromXrConversion, shnitsel.data.xr_io_compatibility.SupportsToXrConversion

Definition of the protocol to support instantiation from xarray dataset structs.

Parameters:

ds (xarray.Dataset)

_raw_dataset: xarray.Dataset#
property dataset: xarray.Dataset#
Return type:

xarray.Dataset

property leading_dimension: str#
Return type:

str

property state_ids#
property state_names#
property state_types#
property state_magnetic_number#
property state_degeneracy_group#
property state_charges#
property active_state#
property state_diagonal#
property atom_names#
property atom_numbers#
property charge: float#

The charge of the molecule if set on the trajectory data. Loaded from charge attribute (or variable) or state_charges coordinate if provided.

If no information is found, 0 is returned.

Return type:

float

set_charge(value)#

Method to set the charge on a dataset, clear conflicting positions of charge info on the dataset and return a new instance of the wrapped dataset.

Parameters:

value (float | xr.DataArray) – Either a single value (optionally wrapped in a DataArray already) to indicate the charge of the full molecule in all states (will be set to coordinate charge) or a DataArray that represents state-dependent charges (which will be set to state_charges)

Returns:

The updated object as a copy.

Return type:

Self

Raises:

ValueError – If an unsupported value was provided.

property dims#
property coords#
property sizes#
property data_vars#
has_variable(name)#
Parameters:

name (str)

Return type:

bool

has_dimension(name)#
Parameters:

name (str)

Return type:

bool

has_coordinate(name)#
Parameters:

name (str)

Return type:

bool

has_data(name)#
Parameters:

name (str)

Return type:

bool

has(name)#
Parameters:

name (str)

Return type:

bool

property mol: rdkit.Chem.Mol#

Helper method to get a representative molecule object for the geometry within this dataset.

Returns:

Either a copy of a cached mol object (for partial substructures) or a newly constructed default object

Return type:

rdkit.Chem.Mol

sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#

Returns a new dataset with each data array indexed by tick labels along the specified dimension(s).

In contrast to .isel, indexers for this method should use labels (i.e. explicit values in that dimension) instead of integers.

Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.

It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.

Parameters:
  • indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.

  • method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –

    Method to use for inexact matches:

    • None (default): only exact matches

    • pad / ffill: propagate last valid index value forward

    • backfill / bfill: propagate next valid index value backward

    • nearest: use nearest valid index value

  • tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

  • drop (bool, optional) – If drop=True, drop coordinates variables in indexers instead of making them scalar.

  • **indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

dataset – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.

Return type:

Self

See also

ShnitselDataset.isel Dataset.sel Dataset.isel DataArray.sel

xarray-tutorial:intermediate/indexing/indexing

Tutorial material on indexing with Xarray objects

xarray-tutorial:fundamentals/02.1_indexing_Basic

Tutorial material on basics of indexing

isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#

Returns a new dataset with each array indexed along the specified dimension(s).

This method selects values from each array using its __getitem__ method, except this method does not require knowing the order of each array’s dimensions.

Parameters:
  • indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.

  • drop (bool, default: False) – If drop=True, drop coordinates variables indexed by integers instead of making them scalar.

  • missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions

  • **indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

obj – A new Dataset with the same contents as this dataset, except each array and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.

Return type:

Dataset

Examples

# A specific element from the dataset is selected

>>> dataset.isel(atom=1, time=0)
<xarray.Dataset> Size:
Dimensions:         (direction: 3)
Coordinates:
    atom        int16 2B 1
    time        float64 8B 0.0
    direction   (direction) <U1 3B 'x' 'y' 'z'
Data variables:
    energy  float64 8B -238.2
    forces  (direction) float64 24B 1.2 -0.2 0.1

# Indexing with a slice using isel

>>> slice_of_data = dataset.isel(atom=slice(0, 2), time=slice(0, 2))
>>> slice_of_data
<xarray.Dataset> Size:
Dimensions:         (atom: 2, time: 2, direction: 3)
Coordinates:
    * atom         (atom) int16 2B 1
    * time         (time) float64 16B 0.0 0.5
    * direction    <U1 3B 'x' 'y' 'z'
Data variables:
    energy      (time) float64 24B -238.2
    forces      (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...
>>> index_array = xr.DataArray([0, 2], dims="atom")
>>> indexed_data = dataset.isel(atom=index_array)
>>> indexed_data
<xarray.Dataset> Size:
Dimensions:         (atom: 2, time: 3, direction: 3)
Coordinates:
  * atom            (atom) int16 4B 1 3
  * time            (time) float64 16B 0.0 0.5 1.0
  * direction       <U1 3B 'x' 'y' 'z'
Data variables:
    energy      (time) float64 24B -238.2 -238.4 -237.9
    forces      (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...

See also

ShnitselDataset.sel Dataset.sel Dataset.isel DataArray.isel

xarray-tutorial:intermediate/indexing/indexing

Tutorial material on indexing with Xarray objects

xarray-tutorial:fundamentals/02.1_indexing_Basic

Tutorial material on basics of indexing

property _attr_sources: Iterable[Mapping[Hashable, Any]]#

Places to look-up items for attribute-style access

Return type:

Iterable[Mapping[Hashable, Any]]

property _item_sources: Iterable[Mapping[Hashable, Any]]#

Places to look-up items for key-completion

Return type:

Iterable[Mapping[Hashable, Any]]

__getattr__(name)#
Parameters:

name (str)

Return type:

Any

__contains__(a)#
_repr_html_()#
Return type:

Any

__getitem__(key)#
__dir__()#

Provide method name lookup and completion. Only provide ‘public’ methods.

Return type:

list[str]

_ipython_key_completions_()#

Provide method for the key-autocompletions in IPython. See https://ipython.readthedocs.io/en/stable/config/integrating.html#tab-completion For the details.

Return type:

list[str]

convert(varname=None, unit=None)#

Convert an entry in this dataset to a specific unit.

Returns a copy of the dataset with the entry updated.

Parameters:
  • varname (str, optional) – Optionally the name of a single variable. If not provided, will apply to all variables.

  • unit (str | None) – The target unit to convert to. If not set, Will convert to default shnitsel units.

Returns:

The updated dataset with converted units.

Return type:

Self

as_xr_dataset()#

Base function to implement by classes supporting this protocol to allow for standardized conversion to a dataset

Returns:

A tuple of the io_type_tag under which the deserializer is registered with the Shnitsel Tools framework (or None if no deserialization is desired/supported)/ Then the `xr.Dataset that is the result of the conversion. And lastly a dict of metadata that might help with deserialization later on.

Return type:

tuple[str, xr.Dataset, MetaData]

Raises:

ValueError – If the conversion failed for some reason.

classmethod get_type_marker()#
Return type:

str

classmethod from_xr_dataset(dataset, metadata)#

Class method to support standardized deserialization of arbitrary classes. Implemented as a class method to avoid need to construct instance for deserialization.

Parameters:
  • cls (type[ResType]) – The class executing the deserialization.

  • dataset (xr.Dataset) – The dataset to be deserialized into the output type.

  • metadata (MetaData) – Metdatata from the serialization process.

Returns:

The deserialized instance of the target class.

Return type:

instance of cls

Raises:

TypeError – If deserialization of the object was not possible

class ShnitselDerivedDataset(base_ds, derived_ds)#

Bases: ShnitselDataset, shnitsel.data.xr_io_compatibility.SupportsFromXrConversion, shnitsel.data.xr_io_compatibility.SupportsToXrConversion

Definition of the protocol to support instantiation from xarray dataset structs.

Parameters:
_base_dataset: xarray.Dataset | None#
property base: xarray.Dataset | None#
Return type:

xarray.Dataset | None

property _item_sources: Iterable[Mapping[Hashable, Any]]#

Places to look-up items for key-completion

Return type:

Iterable[Mapping[Hashable, Any]]

abstractmethod as_xr_dataset()#

Base function to implement by classes supporting this protocol to allow for standardized conversion to a dataset

Returns:

A tuple of the io_type_tag under which the deserializer is registered with the Shnitsel Tools framework (or None if no deserialization is desired/supported)/ Then the `xr.Dataset that is the result of the conversion. And lastly a dict of metadata that might help with deserialization later on.

Return type:

tuple[str, xr.Dataset, MetaData]

Raises:

ValueError – If the conversion failed for some reason.

classmethod get_type_marker()#
Abstractmethod:

Return type:

str

classmethod from_xr_dataset(dataset, metadata)#
Abstractmethod:

Parameters:
  • dataset (xarray.Dataset)

  • metadata (shnitsel.data.xr_io_compatibility.MetaData)

Return type:

shnitsel.data.xr_io_compatibility.ResType

Class method to support standardized deserialization of arbitrary classes. Implemented as a class method to avoid need to construct instance for deserialization.

Parameters:
  • cls (type[ResType]) – The class executing the deserialization.

  • dataset (xr.Dataset) – The dataset to be deserialized into the output type.

  • metadata (MetaData) – Metdatata from the serialization process.

Returns:

The deserialized instance of the target class.

Return type:

instance of cls

Raises:

TypeError – If deserialization of the object was not possible