shnitsel.data.dataset_containers.shared#

Classes#

`ShnitselDataset`	Definition of the protocol to support instantiation from
`ShnitselDerivedDataset`	Definition of the protocol to support instantiation from

Module Contents#

class ShnitselDataset(ds)#

Bases: shnitsel.data.xr_io_compatibility.SupportsFromXrConversion, shnitsel.data.xr_io_compatibility.SupportsToXrConversion

Definition of the protocol to support instantiation from xarray dataset structs.

Parameters:: ds (xarray.Dataset)

_raw_dataset: xarray.Dataset#

property dataset: xarray.Dataset#

Return type:: xarray.Dataset

property leading_dimension: str#

Return type:: str

property state_ids#

property state_names#

property state_types#

property state_magnetic_number#

property state_degeneracy_group#

property state_charges#

property active_state#

property state_diagonal#

property atom_names#

property atom_numbers#

property charge: float#

The charge of the molecule if set on the trajectory data. Loaded from charge attribute (or variable) or state_charges coordinate if provided.

If no information is found, 0 is returned.

Return type:: float

set_charge(value)#

Method to set the charge on a dataset, clear conflicting positions of charge info on the dataset and return a new instance of the wrapped dataset.

Parameters:: value (float | xr.DataArray) – Either a single value (optionally wrapped in a DataArray already) to indicate the charge of the full molecule in all states (will be set to coordinate charge) or a DataArray that represents state-dependent charges (which will be set to state_charges)
Returns:: The updated object as a copy.
Return type:: Self
Raises:: ValueError – If an unsupported value was provided.

property dims#

property coords#

property sizes#

property data_vars#

has_variable(name)#

Parameters:: name (str)
Return type:: bool

has_dimension(name)#

Parameters:: name (str)
Return type:: bool

has_coordinate(name)#

Parameters:: name (str)
Return type:: bool

has_data(name)#

Parameters:: name (str)
Return type:: bool

has(name)#

Parameters:: name (str)
Return type:: bool

property mol: rdkit.Chem.Mol#

Helper method to get a representative molecule object for the geometry within this dataset.

Returns:: Either a copy of a cached mol object (for partial substructures) or a newly constructed default object
Return type:: rdkit.Chem.Mol

sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#

Returns a new dataset with each data array indexed by tick labels along the specified dimension(s).

In contrast to .isel, indexers for this method should use labels (i.e. explicit values in that dimension) instead of integers.

Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.

It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.

Parameters:

indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –
Method to use for inexact matches:
- None (default): only exact matches
- pad / ffill: propagate last valid index value forward
- backfill / bfill: propagate next valid index value backward
- nearest: use nearest valid index value
tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.
drop (bool, optional) – If drop=True, drop coordinates variables in indexers instead of making them scalar.
**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

dataset – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.

Return type:

Self

See also

ShnitselDataset.isel Dataset.sel Dataset.isel DataArray.sel

xarray-tutorial:intermediate/indexing/indexing: Tutorial material on indexing with Xarray objects
xarray-tutorial:fundamentals/02.1_indexing_Basic: Tutorial material on basics of indexing

isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#

Returns a new dataset with each array indexed along the specified dimension(s).

This method selects values from each array using its __getitem__ method, except this method does not require knowing the order of each array’s dimensions.

Parameters:

indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
drop (bool, default: False) – If drop=True, drop coordinates variables indexed by integers instead of making them scalar.
missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions
**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

obj – A new Dataset with the same contents as this dataset, except each array and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.

Return type:

Dataset

Examples

# A specific element from the dataset is selected

>>> dataset.isel(atom=1, time=0)
<xarray.Dataset> Size:
Dimensions:         (direction: 3)
Coordinates:
    atom        int16 2B 1
    time        float64 8B 0.0
    direction   (direction) <U1 3B 'x' 'y' 'z'
Data variables:
    energy  float64 8B -238.2
    forces  (direction) float64 24B 1.2 -0.2 0.1

# Indexing with a slice using isel

>>> slice_of_data = dataset.isel(atom=slice(0, 2), time=slice(0, 2))
>>> slice_of_data
<xarray.Dataset> Size:
Dimensions:         (atom: 2, time: 2, direction: 3)
Coordinates:
    * atom         (atom) int16 2B 1
    * time         (time) float64 16B 0.0 0.5
    * direction    <U1 3B 'x' 'y' 'z'
Data variables:
    energy      (time) float64 24B -238.2
    forces      (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...

>>> index_array = xr.DataArray([0, 2], dims="atom")
>>> indexed_data = dataset.isel(atom=index_array)
>>> indexed_data
<xarray.Dataset> Size:
Dimensions:         (atom: 2, time: 3, direction: 3)
Coordinates:
  * atom            (atom) int16 4B 1 3
  * time            (time) float64 16B 0.0 0.5 1.0
  * direction       <U1 3B 'x' 'y' 'z'
Data variables:
    energy      (time) float64 24B -238.2 -238.4 -237.9
    forces      (time, atom, direction) float64 96B -0.5 -0.4 0.4 ...

shnitsel.data.dataset_containers.shared#

Classes#

Module Contents#

This Page