shnitsel.data.multi_indices#

Attributes#

Classes#

dtype_NA

A sentinel value for the fill_value param in

Functions#

midx_combs(values[, name])

Helper function to create a Multi-index based dimension coordinate for an xarray

flatten_midx(obj, idx_name[, renamer])

Function to flatten a multi-index into a flat index.

flatten_levels(obj, idx_name, levels[, new_name, ...])

Flatten specified levels of a MultiIndex into tuples occupying

expand_midx(obj, midx_name, level_name, value)

Add an outer level to an existing MultiIndex in obj

assign_levels(obj[, levels])

Assign new values to levels of MultiIndexes in obj

mgroupby(obj, levels)

Group a Dataset or DataArray by several levels of a MultiIndex it contains.

msel(obj, **kwargs)

Add data values along a coordinate, chosen based on coordinate values

sel_trajs(obj, trajids_or_mask[, invert])

Select trajectories using a list of trajectories IDs or a boolean mask

_sel_trajids(frames, trajids[, invert])

Select trajectories using a list of trajectories IDs;

_sel_trajs_unstacked(obj, indexer, invert)

unstack_trajs(frames[, fill_value])

Unstack the frame MultiIndex so that trajid and time become

stack_trajs(unstacked)

Stack the trajid and time dims of an unstacked Dataset

is_stacked(obj)

Test whether an object has stacked trajectories

ensure_unstacked(obj[, fill_value])

Unstack obj if it contains stacked trajectories

mdiff(da[, dim])

Take successive differences along the dim dimension

Module Contents#

DatasetOrArray#
midx_combs(values, name=None)#

Helper function to create a Multi-index based dimension coordinate for an xarray from all (unordered) pairwise combinations of entries in values

Parameters:
  • values (pd.core.indexes.base.Index | list) – The source values to generate pairwise combinations for

  • name (str | None, optional) – Optionally a name for the resulting combination dimension. Defaults to None.

Raises:

ValueError – If no name was provided and the name could not be extracted from the values parameter

Returns:

The resulting coordinates object.

Return type:

xr.Coordinates

flatten_midx(obj, idx_name, renamer=None)#

Function to flatten a multi-index into a flat index.

Has the option to provide a custom renaming function

Parameters:
  • obj (xr.Dataset | xr.DataArray) – The object with the index intended to be flattened

  • idx_name (str) – The name of the index to flatten.

  • renamer (callable | None, optional) – An optional function to carry out the renaming of the combined entry from individual entries. Defaults to None.

Returns:

The refactored object without the original index coordinates but with a combined index instead

Return type:

xr.Dataset | xr.DataArray

flatten_levels(obj, idx_name, levels, new_name=None, position=0, renamer=None)#

Flatten specified levels of a MultiIndex into tuples occupying a single MultiIndex level

Parameters:
  • obj (DatasetOrArray) – A Dataset or DataArray with at least one MultiIndex

  • idx_name (str) – The name of the MultiIndex

  • levels (Sequence[str]) – Which levels to flatten

  • new_name (str, optional) – The name of the single resulting index, by default None

  • position (int, optional) – The position of the resulting level in the MultiIndex, by default 0

  • renamer (Callable, optional) – A Callable to compute the values in the new level as a function of the values in the original separate levels, by default None

Returns:

An object differing from obj only in the flattening of specified levels

Return type:

DatasetOrArray

Raises:

ValueError – If the specified index is associated with more than one dimension (this should not be possible for a MultiIndex anyway)

expand_midx(obj, midx_name, level_name, value)#

Add an outer level to an existing MultiIndex in obj

Parameters:
  • obj (DatasetOrArray) – A Dataset or DataArray with at least one MultiIndex

  • midx_name (str) – The name of the MultiIndex

  • level_name (str) – The name of the new level

  • value – Values with to populate the new level

Returns:

An object differing from obj only in the addition of the MultiIndex level

Return type:

DatasetOrArray

assign_levels(obj, levels=None, **levels_kwargs)#

Assign new values to levels of MultiIndexes in obj

Parameters:
  • obj (DatasetOrArray) – An xarray object with at least one MultiIndex

  • levels (dict[str, npt.ArrayLike], optional) – A mapping whose keys are the names of the levels and whose values are the levels to assign. The mapping will be passed to xarray.DataArray.assign_coords() (or the xarray.Dataset equivalent).

  • **levels_kwargs – Keyword arguments to define the levels by instead of providing them as a dict

Returns:

A new object (of the same type as obj) with the new level values replacing the old level values.

Return type:

DatasetOrArray

Raises:

ValueError – If levels are provided in both keyword and dictionary form.

Notes

Propagates attrs irrespective of xarray.get_options()['keep_attrs']

mgroupby(obj, levels)#

Group a Dataset or DataArray by several levels of a MultiIndex it contains.

Parameters:
  • obj (xr.Dataset | xr.DataArray) – The xr object to group

  • levels (Sequence[str]) – Names of MultiIndex levels all belonging to the same MultiIndex

Returns:

The grouped object, which behaves as documented at xr.Dataset.groupby() and xr.DataArray.groupby with the caveat that the specified levels have been “flattened” into a single Multiindex level of tuples.

Return type:

DataArrayGroupBy | DatasetGroupBy

Raises:

ValueError – If no MultiIndex is found, or if the named levels belong to different MultiIndexes.

Warning

The function does not currently check whether the levels specified are really levels of a MultiIndex, as opposed to names of non-MultiIndex indexes.

msel(obj, **kwargs)#

Add data values along a coordinate, chosen based on coordinate values

Parameters:
  • obj (DatasetOrArray) – A Dataset or DataArray with at least one coordinate containing all the values given by the kwargs parameter name

  • **kwargs – Tuples of key:value pairs as keyword arguments to select from entries in a multi-index.

Returns:

  • The coordinate (presumably unique) from obj that contains all the parameter

  • names in kwargs

Raises:

ValueError – If no coordinate in obj contains all the parameter names in kwargs

Return type:

DatasetOrArray

sel_trajs(obj, trajids_or_mask, invert=False)#

Select trajectories using a list of trajectories IDs or a boolean mask

Parameters:
  • obj (DatasetOrArray) – The xr.Dataset from which a selection is to be drawn

  • trajids_or_mask (Sequence[int] | Sequence[bool]) –

    Either
    • A sequences of integers representing trajectory IDs to be included, in which case the trajectories may not be returned in the order specified.

    • Or a sequence of booleans, each indicating whether the trajectory with an ID in the corresponding entry in the Dataset’s trajid_ coordinate should be included

  • invert (bool, optional) – Whether to invert the selection, i.e. return those trajectories not specified, by default False

Returns:

A new xr.Dataset containing only the specified trajectories

Return type:

DatasetOrArray

Raises:
  • NotImplementedError – when an attempt is made to index an xr.Datset without a trajid_ dimension/coordinate using a boolean mask

  • TypeError – If trajids_or_mask has a dtype other than integer or boolean

_sel_trajids(frames, trajids, invert=False)#

Select trajectories using a list of trajectories IDs; note that the trajectories may not be returned in the order specified.

Parameters:
  • frames (DatasetOrArray) – The xr.Dataset from which a selection is to be drawn

  • trajids (npt.ArrayLike) – A sequences of integers representing trajectory IDs to be included,

  • invert (bool, optional) – Whether to invert the selection, i.e. return those trajectories not specified, by default False

Returns:

A new xr.Dataset containing only the specified trajectories

Return type:

DatasetOrArray

Raises:

KeyError – If some of the supplied trajectory IDs are not present in the trajectory coordinate

_sel_trajs_unstacked(obj, indexer, invert)#
class dtype_NA#

A sentinel value for the fill_value param in shnitsel.data.multi_indices.unstack_trajs()

unstack_trajs(frames, fill_value=dtype_NA)#

Unstack the frame MultiIndex so that trajid and time become separate dims. Wraps the xarray.Dataset.unstack() method.

Parameters:
  • frames (DatasetOrArray) – An xarray.Dataset with a frame dimension associated with a MultiIndex coordinate with levels named trajid and time. The Dataset may also have a trajid_ dimension used for variables and coordinates that store information pertaining to each trajectory in aggregate; this will be aligned along the trajid dimension of the unstacked Dataset.

  • fill_value – The value used to fill in entries that were unspecified in stacked format; by default, the dtype’s NA value will be used.

Returns:

An xarray.Dataset with independent trajid and time dimensions. Same type as frames

Return type:

DatasetOrArray

stack_trajs(unstacked)#

Stack the trajid and time dims of an unstacked Dataset into a MultiIndex along a new dimension called frame. Wraps the xarray.Dataset.stack() method.

Parameters:
  • frames (DatasetOrArray) – An xarray.Dataset with independent trajid and time dimensions.

  • unstacked (DatasetOrArray)

Returns:

An xarray.Dataset with a frame dimension associated with a MultiIndex coordinate with levels named trajid and time. Those variables and coordinates which only depended on one of trajid or time but not the other in the unstacked Dataset, will be aligned along new dimensions named trajid_ and time_. The new dimensions trajid_ and time_ will be independent of the frame dimension and its trajid and time levels.

Return type:

DatasetOrArray

is_stacked(obj)#

Test whether an object has stacked trajectories

Parameters:

obj – An xarray Dataset/DataArray, or a wrapper around one

Returns:

  • True if obj shows signs of containing multiple

  • trajectories along the same dimension as used for the

  • time coordinate.

ensure_unstacked(obj, fill_value=dtype_NA)#

Unstack obj if it contains stacked trajectories

Parameters:
  • obj – An xarray Dataset/DataArray, or a wrapper around one

  • fill_value – The value used to fill in entries that were unspecified in stacked format; by default, the dtype’s NA value will be used.

Returns:

  • unstacked – The unstacked Dataset/DataArray

  • was_stacked – Whether obj had stacked trajectories

mdiff(da, dim=None)#

Take successive differences along the dim dimension

Parameters:
  • da (xr.DataArray) – An xarray.DataArray with a dimension dim corresponding to a pandas.MultiIndex of which the innermost level is ‘time’.

  • dim (str, optional) – The dimension along which the successive differences should be calculated.

Returns:

  • An xarray.DataArray with the same shape, dimension names etc.,

  • but with the data of the (i)th frame replaced by the difference between

  • the original (i+1)th and (i)th frames, with zeros filling in for both the

  • initial frame and any frame for which time = 0, to avoid taking differences

  • between the last and first frames of successive trajectories.

Return type:

xarray.DataArray