shnitsel.data.tree.node#
Attributes#
Classes#
Base class to model a tree structure of arbitrary data type to keep |
Functions#
|
Helper function to extract trajectory metadata of leaf nodes for trees with |
Module Contents#
- ChildType#
- DataType#
- NewDataType#
- NewChildType#
- ResType#
- KeyType#
- T#
- _class_cache#
- class TreeNode(*, name, data=None, children=None, attrs=None, level_name=None, dtype=None, **kwargs)#
Bases:
Generic[ChildType,DataType],abc.ABCBase class to model a tree structure of arbitrary data type to keep trajectory data with hierarchical structure in.
Has two type parameters to allow for explicit type checks: - ChildType: Which node types are allowed to be registered as children of this node. - DataType: What kind of data is expected within this tree if the data is not None.
- Parameters:
- classmethod _get_extended_class_name(datatypes)#
- classmethod _create_extended_node_class(datatypes)#
Create a new version of the class with added methods for the datatypes.
- classmethod __class_getitem__(args)#
- _dtype: type[DataType] | types.UnionType | None#
- _children: Mapping[Hashable, ChildType]#
- static _dtype_guess_from_children(children)#
- Parameters:
children (Mapping | None)
- Return type:
type | types.UnionType | None
- construct_copy(children: Mapping[Hashable, ChildType] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) TreeNode[NewChildType, ResType]
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) TreeNode[Any, ResType]
Every class inheriting from TreeNode should implement this method to create a copy of that subtree with appropriate typing or just plain up creating a copy of the subtree, if no updates are requested.
Support for changing the typing by changing child types, setting the explicit dtype or by providing a new data entry should be supported by the base class.
- Parameters:
data (ResType | None, optional) – The new data to be set in the copy of this node, by default None, which should populate it with the node’s current data
children (Mapping[str, NewChildType], optional) – A new set of children to replace the old mapping of children can be provided with this parameter. The data type can also be changed with appropriate typing here:
dtype (type[ResType] | UnionType | None, optional) – An explicit argument to set the dtype property of the new subtree, by default None.
- Returns:
Returns a new subtree with a duplicate of this node in regards to metadata at its root and updates properties as provided.
- Return type:
- __len__()#
Returns the size of this node, i.e. how many children it has.
Be aware that this means that it will return 0 for Leaf nodes that may hold data.
- Returns:
The number of children of this node
- Return type:
- __getitem__(key)#
- __setitem__(key, value)#
- property dtype: type[DataType] | types.UnionType | None#
- Return type:
type[DataType] | types.UnionType | None
- property data: DataType#
- Return type:
DataType
- property children: Mapping[Hashable, ChildType]#
- Return type:
Mapping[Hashable, ChildType]
- map_subtree(func)#
Just a helper function with telling name to apply a function to the root node of this current subtree.
Simply calls func(self).
- Parameters:
func (Callable[[Self], ResType]) – The function to apply to this node
- Returns:
The result of funct(self).
- Return type:
ResType
- abstractmethod group_children_by(key_func, group_leaves_only=False)#
Method to group nodes within this current subtree by keys as retrieved via key_func.
Can be used to group data within this tree by metadata, e.g. to separate trajectory data with different simulation settings into distinct groups.
Adds new groups into the tree structure.
- Parameters:
key_func (Callable[[TreeNode], KeyType]) – Key function that should map Any tree node that is not excluded, e.g. by setting group_leaves_only to a key value that should be a dataclass and should be equal for two nodes if and only if those nodes should eventually end up in the same group.
group_leaves_only (bool, optional) – Flag to control whether grouping should only be applied to DataLeaf nodes, by default False
- Returns:
The current node after its subtree has been grouped. If no keys could be retrieved, the result may be None.
- Return type:
Self | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: type[ResType], **kwargs) TreeNode[Any, ResType]#
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: type[ResType], **kwargs) TreeNode[Any, ResType] | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode[Any, ResType] | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode[Any, ResType]
- map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode | None
- map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode
Helper function to apply a mapping function to all data in leaves of this tree
The function func is applied to all DataLeaf instances with data within them. If keep_empty_branches=False is set, will truncate branches without any data in them or without any further children.
- Parameters:
func (Callable[[DataType], ResType | None]) – The mapping function to apply to data in this subtree.
keep_empty_branches (bool, optional) – Flag to control whether branches/subtrees without any data in them should be truncated, by default False to keep the same structure
dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None
*args – Positional arguments to pass to the call to func
**kwargs – Keyword-style arguments to pass to the call to func
- Returns:
The resulting node after the subtree has been mapped or None if truncation is active and the subtree has no data after mapping.
- Return type:
TreeNode[Any,ResType]|None
- map_filtered_nodes(filter_func, map_func, dtype=None)#
Map nodes using map_func() if the filter function filter_func picks them as relevant.
If the node is not picked by filter_func a copy will be created with its children being recursively mapped according to the same rule. If a node is mapped, the mapping function map_func must take care of potential mapping over children.
- Parameters:
filter_func (Callable[[TreeNode[Any, DataType]], bool]) – Filter function to apply to nodes in the current subtree of any kind. Must return True for all nodes to which map_func should be applied.
map_func (Callable[[TreeNode[Any, DataType]], TreeNode[Any, ResType]|None]) – Mapping function that transforms a selected node of a certain datatype to a consistent new data type RestType.
dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None.
- Returns:
TreeNode[Any, ResType] – A new subtree with the data type changed and select subtrees mapped.
None – If the node was filtered and the map function returned None
- Return type:
TreeNode[Any, ResType]|None
- filter_nodes(filter_func, recurse=True, keep_empty_branches=False)#
Function to filter the nodes in this tree and create a new tree that are ancestors of at least one accepted node.
If keep_empty_branches=True, all branches in which there are no accepted nodes, will be truncated. If filter_func does not return True, the entire subtree starting at this node, will be dropped.
- Parameters:
filter_func (Callable[..., bool]) – A filter function that should return True for Nodes that should be kept within the Tree and False for Nodes that should be kicked out together with their entire subtree.
recurse (bool, optional) – Whether to recurse the filtering into the children of kept nodes, by default True
keep_empty_branches (bool, optional) – A flag to enable truncation of branches with only empty lists of children and no data, by default False
- Returns:
Either a copy of the current subtree if it is kept or None if the subtree is omitted
- Return type:
Self | None
- add_child(child_name, child)#
Add a new child node with a preferred name in the mapping of children. If the child name is already in use, will attempt to find a collision-free alternative name.
- Parameters:
- Raises:
OverflowError – If the attempts to find a new collision-free name have exceeded 1000.:
- Returns:
Self
- Return type:
The new instance of a subtree
- assign_children(new_children)#
Helper function to assign new children to this node without changing the child or data type of the tree
Unlike calling construct_copy() directly, this will retain already existing children under this node if new_children does not overwrite all keys in this node
- Parameters:
new_children (Mapping[Hashable, ChildType]) – The mapping of additional children to be appended to this node’s list of children.
- Returns:
A copy of this node but with potentially more or different child nodes.
- Return type:
Self
- is_level(target_level)#
Check whether we are at a certain level in the ShnitselDB structure
- collect_data(with_path: typing_extensions.Literal[True]) Iterator[tuple[str, DataType]]#
- collect_data(with_path: typing_extensions.Literal[False] = False) Iterator[DataType]
Function to retrieve all data entries in the tree underneath this node.
Helpful for aggregating across all entries in a subtree without the need for full hierarchical information.
- Parameters:
with_path (bool, default=False) – Flag to obtain an iterable over the pairs of paths and data instead.
- Yields:
Iterator[Iterable[DataType]] – An iterator over all the data entries in this subtree.
Iterator[tuple[str, DataType]] – An iterator over all the data entries in this subtree paired with their paths in the tree.
- apply_data_attributes(properties)#
- map_flat_group_data(map_func)#
Helper function to apply a mapping function to all flat group nodes.
Will only apply the mapping function to nodes of type DataGroup and only those who have exclusively DataLeaf children.
- Parameters:
map_func (Callable[[Iterable[DataType]], ResType | None]) – Function mapping the data in the flat groups to a new result type
- Returns:
A new subtree structure, which will hold leaves with ResType data underneath each mapped group.
- Return type:
Self | TreeNode[Any, ResType]
- group_data_by_metadata()#
Helper function to allow for grouping of data within the tree by the metadata extracted from Trajectories.
Should only be called on trees where DataType=Trajectory or DataType=Frames or subtypes thereof. Will fail due to an attribute error or yield an empty tree otherwise.
- Returns:
A tree where leaves are grouped to have similar metadata and only leaves with the same metadata are within the same gorup.
- Return type:
Self
- property as_stacked: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType#
- Return type:
shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType
- to_stacked(only_direct_children=False)#
Stack the trajectories in a subtree into a multi-trajetctory dataset.
The resulting dataset has a new frame dimension along which we can iterate through all individual frames of all trajectories.
- Parameters:
only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.
- Returns:
MultiSeriesStacked – The resulting multi-trajectory dataset stacked along a frame dimension
DataType – If it is an xarray.DataArray tree that we are concatenating.
- Return type:
shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType
- property as_layered: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered#
- to_layered(only_direct_children=False)#
Lazer the trajectories in a subtree into a multi-trajectory dataset.
The resulting dataset has a new trajectorz dimension along which we can iterate through all individual frames of all trajectories.
- Parameters:
only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.
- Returns:
The resulting multi-trajectory dataset layered along a trajectory dimension
- Return type:
- abstractmethod sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#
Returns a new dataset with each array indexed by tick labels along the specified dimension(s).
In contrast to Dataset.isel, indexers for this method should use labels instead of integers.
Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.
It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –
Method to use for inexact matches:
None (default): only exact matches
pad / ffill: propagate last valid index value forward
backfill / bfill: propagate next valid index value backward
nearest: use nearest valid index value
tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
abs(index[indexer] - target) <= tolerance.drop (bool, optional) – If
drop=True, drop coordinates variables in indexers instead of making them scalar.**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
obj – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.
- Return type:
Dataset
See also
Dataset.iselDataArray.sel- xarray-tutorial:intermediate/indexing/indexing
Tutorial material on indexing with Xarray objects
- xarray-tutorial:fundamentals/02.1_indexing_Basic
Tutorial material on basics of indexing
- abstractmethod isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#
Returns a new tree indexed along dimensions compound, group or trajectory and with data in leaves of the tree indexed along the remaining specified dimension(s) if the leaves support .isel() operations.
Internally, it filters data with their own .isel() functions and performs some additional filtering specific to the tree structure
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
drop (bool, default: False) – If
drop=True, drop coordinates variables indexed by integers instead of making them scalar.missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions
**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
obj – A new tree with the same contents as this tree, except each data entry is indexed by the appropriate indexers and subtrees are filtered by the choices in tree-specific dimensions. The logic for selection on the leaf data entries is specific to the type of data in the leaf.
- Return type:
TreeNode[ChildType, DataType]
Examples
# TODO: FIXME: Provide better tree selection example.
>>> tree = xr.Dataset( ... { ... "math_scores": ( ... ["student", "test"], ... [[90, 85, 92], [78, 80, 85], [95, 92, 98]], ... ), ... "english_scores": ( ... ["student", "test"], ... [[88, 90, 92], [75, 82, 79], [93, 96, 91]], ... ), ... }, ... coords={ ... "student": ["Alice", "Bob", "Charlie"], ... "test": ["Test 1", "Test 2", "Test 3"], ... }, ... )
# A specific element from the dataset is selected
>>> dataset.isel(student=1, test=0) <xarray.Dataset> Size: 68B Dimensions: () Coordinates: student <U7 28B 'Bob' test <U6 24B 'Test 1' Data variables: math_scores int64 8B 78 english_scores int64 8B 75
# Indexing with a slice using isel
>>> slice_of_data = dataset.isel(student=slice(0, 2), test=slice(0, 2)) >>> slice_of_data <xarray.Dataset> Size: 168B Dimensions: (student: 2, test: 2) Coordinates: * student (student) <U7 56B 'Alice' 'Bob' * test (test) <U6 48B 'Test 1' 'Test 2' Data variables: math_scores (student, test) int64 32B 90 85 78 80 english_scores (student, test) int64 32B 88 90 75 82
# Indexing using a sequence of keys.
See also
Dataset.iselTreeNode.sel
- __str__()#
A basic representation of this node.
Only contains rudimentary information about this node. Use repr() for a more extensive representation.
- Returns:
A string representation with minimal information.
- Return type:
- __repr__()#
A simple representation of the data and structure of this subtree.
_extended_summary_
- Returns:
A string representation with more extensive information than that returned by __str__()
- Return type:
- _trajectory_key_func(node)#
Helper function to extract trajectory metadata of leaf nodes for trees with appropriate data types.
If applied to other nodes may yield a None key or just their name attribute as a str.
- Parameters:
node (TreeNode) – The node to extract the TrajectoryGroupingMetadata metadata from. See Trajectory.get_grouping_metadata() for creation of the meta data instance.
- Returns:
The key to use for the grouping of this node.
- Return type:
None | str | TrajectoryGroupingMetadata