shnitsel.data.tree.node#

Attributes#

Classes#

TreeNode

Base class to model a tree structure of arbitrary data type to keep

Functions#

_trajectory_key_func(node)

Helper function to extract trajectory metadata of leaf nodes for trees with

Module Contents#

ChildType#
DataType#
NewDataType#
NewChildType#
ResType#
KeyType#
T#
_class_cache#
class TreeNode(*, name, data=None, children=None, attrs=None, level_name=None, dtype=None, **kwargs)#

Bases: Generic[ChildType, DataType], abc.ABC

Base class to model a tree structure of arbitrary data type to keep trajectory data with hierarchical structure in.

Has two type parameters to allow for explicit type checks: - ChildType: Which node types are allowed to be registered as children of this node. - DataType: What kind of data is expected within this tree if the data is not None.

Parameters:
  • name (str | None)

  • data (DataType | None)

  • children (Mapping[Hashable, ChildType] | None)

  • attrs (Mapping[str, Any] | None)

  • level_name (str | None)

  • dtype (type[DataType] | None)

classmethod _get_extended_class_name(datatypes)#
Parameters:

datatypes (Sequence[type])

Return type:

str

classmethod _create_extended_node_class(datatypes)#

Create a new version of the class with added methods for the datatypes.

Parameters:

datatypes (list[tuple[type, list[str], list[str]]])

Return type:

type[Self]

classmethod __class_getitem__(args)#
Parameters:

args (TypeVar | tuple[TypeVar , ...])

Return type:

type[Self]

_name: str | None#
_dtype: type[DataType] | types.UnionType | None#
_data: DataType | None#
_children: Mapping[Hashable, ChildType]#
_attrs: Mapping[str, Any]#
_parent: Self | None#
_level_name: str | None#
static _dtype_guess_from_children(children)#
Parameters:

children (Mapping | None)

Return type:

type | types.UnionType | None

construct_copy(children: Mapping[Hashable, ChildType] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) TreeNode[NewChildType, ResType]
construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) TreeNode[Any, ResType]

Every class inheriting from TreeNode should implement this method to create a copy of that subtree with appropriate typing or just plain up creating a copy of the subtree, if no updates are requested.

Support for changing the typing by changing child types, setting the explicit dtype or by providing a new data entry should be supported by the base class.

Parameters:
  • data (ResType | None, optional) – The new data to be set in the copy of this node, by default None, which should populate it with the node’s current data

  • children (Mapping[str, NewChildType], optional) – A new set of children to replace the old mapping of children can be provided with this parameter. The data type can also be changed with appropriate typing here:

  • dtype (type[ResType] | UnionType | None, optional) – An explicit argument to set the dtype property of the new subtree, by default None.

Returns:

Returns a new subtree with a duplicate of this node in regards to metadata at its root and updates properties as provided.

Return type:

Self | TreeNode[TreeNode[Any, RestType]|None, ResType]

property path: str#
Return type:

str

__len__()#

Returns the size of this node, i.e. how many children it has.

Be aware that this means that it will return 0 for Leaf nodes that may hold data.

Returns:

The number of children of this node

Return type:

int

__contains__(value)#
Parameters:

value (str | ChildType)

Return type:

bool

__getitem__(key)#
Parameters:

key (str | tuple[str])

Return type:

TreeNode[Any, DataType] | DataType | None

__setitem__(key, value)#
property is_leaf: bool#
Return type:

bool

property has_data: bool#
Return type:

bool

property dtype: type[DataType] | types.UnionType | None#
Return type:

type[DataType] | types.UnionType | None

property data: DataType#
Return type:

DataType

property children: Mapping[Hashable, ChildType]#
Return type:

Mapping[Hashable, ChildType]

property root: TreeNode[Any, DataType]#
Return type:

TreeNode[Any, DataType]

property attrs: Mapping[str, Any]#
Return type:

Mapping[str, Any]

property name: str#
Return type:

str

map_subtree(func)#

Just a helper function with telling name to apply a function to the root node of this current subtree.

Simply calls func(self).

Parameters:

func (Callable[[Self], ResType]) – The function to apply to this node

Returns:

The result of funct(self).

Return type:

ResType

abstractmethod group_children_by(key_func, group_leaves_only=False)#

Method to group nodes within this current subtree by keys as retrieved via key_func.

Can be used to group data within this tree by metadata, e.g. to separate trajectory data with different simulation settings into distinct groups.

Adds new groups into the tree structure.

Parameters:
  • key_func (Callable[[TreeNode], KeyType]) – Key function that should map Any tree node that is not excluded, e.g. by setting group_leaves_only to a key value that should be a dataclass and should be equal for two nodes if and only if those nodes should eventually end up in the same group.

  • group_leaves_only (bool, optional) – Flag to control whether grouping should only be applied to DataLeaf nodes, by default False

Returns:

The current node after its subtree has been grouped. If no keys could be retrieved, the result may be None.

Return type:

Self | None

map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: type[ResType], **kwargs) TreeNode[Any, ResType]#
map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: type[ResType], **kwargs) TreeNode[Any, ResType] | None
map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode[Any, ResType] | None
map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode[Any, ResType]
map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode | None
map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode

Helper function to apply a mapping function to all data in leaves of this tree

The function func is applied to all DataLeaf instances with data within them. If keep_empty_branches=False is set, will truncate branches without any data in them or without any further children.

Parameters:
  • func (Callable[[DataType], ResType | None]) – The mapping function to apply to data in this subtree.

  • keep_empty_branches (bool, optional) – Flag to control whether branches/subtrees without any data in them should be truncated, by default False to keep the same structure

  • dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None

  • *args – Positional arguments to pass to the call to func

  • **kwargs – Keyword-style arguments to pass to the call to func

Returns:

The resulting node after the subtree has been mapped or None if truncation is active and the subtree has no data after mapping.

Return type:

TreeNode[Any,ResType]|None

map_filtered_nodes(filter_func, map_func, dtype=None)#

Map nodes using map_func() if the filter function filter_func picks them as relevant.

If the node is not picked by filter_func a copy will be created with its children being recursively mapped according to the same rule. If a node is mapped, the mapping function map_func must take care of potential mapping over children.

Parameters:
  • filter_func (Callable[[TreeNode[Any, DataType]], bool]) – Filter function to apply to nodes in the current subtree of any kind. Must return True for all nodes to which map_func should be applied.

  • map_func (Callable[[TreeNode[Any, DataType]], TreeNode[Any, ResType]|None]) – Mapping function that transforms a selected node of a certain datatype to a consistent new data type RestType.

  • dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None.

Returns:

  • TreeNode[Any, ResType] – A new subtree with the data type changed and select subtrees mapped.

  • None – If the node was filtered and the map function returned None

Return type:

TreeNode[Any, ResType]|None

filter_nodes(filter_func, recurse=True, keep_empty_branches=False)#

Function to filter the nodes in this tree and create a new tree that are ancestors of at least one accepted node.

If keep_empty_branches=True, all branches in which there are no accepted nodes, will be truncated. If filter_func does not return True, the entire subtree starting at this node, will be dropped.

Parameters:
  • filter_func (Callable[..., bool]) – A filter function that should return True for Nodes that should be kept within the Tree and False for Nodes that should be kicked out together with their entire subtree.

  • recurse (bool, optional) – Whether to recurse the filtering into the children of kept nodes, by default True

  • keep_empty_branches (bool, optional) – A flag to enable truncation of branches with only empty lists of children and no data, by default False

Returns:

Either a copy of the current subtree if it is kept or None if the subtree is omitted

Return type:

Self | None

add_child(child_name, child)#

Add a new child node with a preferred name in the mapping of children. If the child name is already in use, will attempt to find a collision-free alternative name.

Parameters:
  • None) (child_name (str |) – To avoid overriding, a different name will be chosen if the name is in use.

  • (ChildType) (child)

  • child_name (str | None)

  • child (ChildType)

Raises:

OverflowError – If the attempts to find a new collision-free name have exceeded 1000.:

Returns:

Self

Return type:

The new instance of a subtree

assign_children(new_children)#

Helper function to assign new children to this node without changing the child or data type of the tree

Unlike calling construct_copy() directly, this will retain already existing children under this node if new_children does not overwrite all keys in this node

Parameters:

new_children (Mapping[Hashable, ChildType]) – The mapping of additional children to be appended to this node’s list of children.

Returns:

A copy of this node but with potentially more or different child nodes.

Return type:

Self

is_level(target_level)#

Check whether we are at a certain level in the ShnitselDB structure

Parameters:

target_level (str | Iterable[str]) – Desired level(s) to check for and accept as the target level.

Returns:

True if the current node is of the required level or one of the required levels

Return type:

bool

collect_data(with_path: typing_extensions.Literal[True]) Iterator[tuple[str, DataType]]#
collect_data(with_path: typing_extensions.Literal[False] = False) Iterator[DataType]

Function to retrieve all data entries in the tree underneath this node.

Helpful for aggregating across all entries in a subtree without the need for full hierarchical information.

Parameters:

with_path (bool, default=False) – Flag to obtain an iterable over the pairs of paths and data instead.

Yields:
  • Iterator[Iterable[DataType]] – An iterator over all the data entries in this subtree.

  • Iterator[tuple[str, DataType]] – An iterator over all the data entries in this subtree paired with their paths in the tree.

apply_data_attributes(properties)#
Parameters:

properties (dict) – The attributes to set with their respective values.

Returns:

The subtree after the update

Return type:

Self | TreeNode[Any, DataType]

map_flat_group_data(map_func)#

Helper function to apply a mapping function to all flat group nodes.

Will only apply the mapping function to nodes of type DataGroup and only those who have exclusively DataLeaf children.

Parameters:

map_func (Callable[[Iterable[DataType]], ResType | None]) – Function mapping the data in the flat groups to a new result type

Returns:

A new subtree structure, which will hold leaves with ResType data underneath each mapped group.

Return type:

Self | TreeNode[Any, ResType]

group_data_by_metadata()#

Helper function to allow for grouping of data within the tree by the metadata extracted from Trajectories.

Should only be called on trees where DataType=Trajectory or DataType=Frames or subtypes thereof. Will fail due to an attribute error or yield an empty tree otherwise.

Returns:

A tree where leaves are grouped to have similar metadata and only leaves with the same metadata are within the same gorup.

Return type:

Self

property as_stacked: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType#
Return type:

shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType

to_stacked(only_direct_children=False)#

Stack the trajectories in a subtree into a multi-trajetctory dataset.

The resulting dataset has a new frame dimension along which we can iterate through all individual frames of all trajectories.

Parameters:

only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.

Returns:

  • MultiSeriesStacked – The resulting multi-trajectory dataset stacked along a frame dimension

  • DataType – If it is an xarray.DataArray tree that we are concatenating.

Return type:

shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType

property as_layered: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered#
Return type:

shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered

to_layered(only_direct_children=False)#

Lazer the trajectories in a subtree into a multi-trajectory dataset.

The resulting dataset has a new trajectorz dimension along which we can iterate through all individual frames of all trajectories.

Parameters:

only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.

Returns:

The resulting multi-trajectory dataset layered along a trajectory dimension

Return type:

MultiSeriesLayered

abstractmethod sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#

Returns a new dataset with each array indexed by tick labels along the specified dimension(s).

In contrast to Dataset.isel, indexers for this method should use labels instead of integers.

Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.

It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.

Parameters:
  • indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.

  • method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –

    Method to use for inexact matches:

    • None (default): only exact matches

    • pad / ffill: propagate last valid index value forward

    • backfill / bfill: propagate next valid index value backward

    • nearest: use nearest valid index value

  • tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

  • drop (bool, optional) – If drop=True, drop coordinates variables in indexers instead of making them scalar.

  • **indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

obj – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.

Return type:

Dataset

See also

Dataset.isel DataArray.sel

xarray-tutorial:intermediate/indexing/indexing

Tutorial material on indexing with Xarray objects

xarray-tutorial:fundamentals/02.1_indexing_Basic

Tutorial material on basics of indexing

abstractmethod isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#

Returns a new tree indexed along dimensions compound, group or trajectory and with data in leaves of the tree indexed along the remaining specified dimension(s) if the leaves support .isel() operations.

Internally, it filters data with their own .isel() functions and performs some additional filtering specific to the tree structure

Parameters:
  • indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.

  • drop (bool, default: False) – If drop=True, drop coordinates variables indexed by integers instead of making them scalar.

  • missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions

  • **indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of indexers. One of indexers or indexers_kwargs must be provided.

Returns:

obj – A new tree with the same contents as this tree, except each data entry is indexed by the appropriate indexers and subtrees are filtered by the choices in tree-specific dimensions. The logic for selection on the leaf data entries is specific to the type of data in the leaf.

Return type:

TreeNode[ChildType, DataType]

Examples

# TODO: FIXME: Provide better tree selection example.

>>> tree = xr.Dataset(
...     {
...         "math_scores": (
...             ["student", "test"],
...             [[90, 85, 92], [78, 80, 85], [95, 92, 98]],
...         ),
...         "english_scores": (
...             ["student", "test"],
...             [[88, 90, 92], [75, 82, 79], [93, 96, 91]],
...         ),
...     },
...     coords={
...         "student": ["Alice", "Bob", "Charlie"],
...         "test": ["Test 1", "Test 2", "Test 3"],
...     },
... )

# A specific element from the dataset is selected

>>> dataset.isel(student=1, test=0)
<xarray.Dataset> Size: 68B
Dimensions:         ()
Coordinates:
    student         <U7 28B 'Bob'
    test            <U6 24B 'Test 1'
Data variables:
    math_scores     int64 8B 78
    english_scores  int64 8B 75

# Indexing with a slice using isel

>>> slice_of_data = dataset.isel(student=slice(0, 2), test=slice(0, 2))
>>> slice_of_data
<xarray.Dataset> Size: 168B
Dimensions:         (student: 2, test: 2)
Coordinates:
  * student         (student) <U7 56B 'Alice' 'Bob'
  * test            (test) <U6 48B 'Test 1' 'Test 2'
Data variables:
    math_scores     (student, test) int64 32B 90 85 78 80
    english_scores  (student, test) int64 32B 88 90 75 82

# Indexing using a sequence of keys.

See also

Dataset.isel TreeNode.sel

__str__()#

A basic representation of this node.

Only contains rudimentary information about this node. Use repr() for a more extensive representation.

Returns:

A string representation with minimal information.

Return type:

str

__repr__()#

A simple representation of the data and structure of this subtree.

_extended_summary_

Returns:

A string representation with more extensive information than that returned by __str__()

Return type:

str

_repr_html_()#

Obtain an html representation of this subtree.

Currently generates a tabular representation of the subtree.

Returns:

A html string representing the data in this subtree.

Return type:

str

_trajectory_key_func(node)#

Helper function to extract trajectory metadata of leaf nodes for trees with appropriate data types.

If applied to other nodes may yield a None key or just their name attribute as a str.

Parameters:

node (TreeNode) – The node to extract the TrajectoryGroupingMetadata metadata from. See Trajectory.get_grouping_metadata() for creation of the meta data instance.

Returns:

The key to use for the grouping of this node.

Return type:

None | str | TrajectoryGroupingMetadata