shnitsel.data.tree#
Submodules#
- shnitsel.data.tree.child_support_functions
- shnitsel.data.tree.compound
- shnitsel.data.tree.data_group
- shnitsel.data.tree.data_leaf
- shnitsel.data.tree.datatree_level
- shnitsel.data.tree.node
- shnitsel.data.tree.selection
- shnitsel.data.tree.support_functions
- shnitsel.data.tree.tree
- shnitsel.data.tree.tree_completion
- shnitsel.data.tree.tree_vis
- shnitsel.data.tree.xr_conversion
Attributes#
Classes#
Class to represent a leaf node holding data in the ShnitselDB tree hierarchy. |
|
Class to hold auxiliary info of a group/collection of Data in ShnitselDB |
|
DataTree node to keep track of all data associated with a common compound within the datatree |
|
Class to hold identifying and auxiliary info of a compound type in ShnitselDB |
|
Class to use as a root for a ShnitselDB tree structure with specific Node types at different layer depths. |
|
Base class to model a tree structure of arbitrary data type to keep |
Functions#
|
Helper function to allow zipping of multiple trees into a single tree with tuples of data for |
|
Function to check whether a set of trees has the same overall structure |
|
Helper function to merge two trees at the same level. |
|
Helper function to invert the operation of tree_to_xarray_datastree and deserialize the |
|
Helper function to convert a ShnitselDB tree format to xarray.DataTree format |
Package Contents#
- class DataLeaf(*, name=None, data=None, **kwargs)#
Bases:
Generic[DataType],shnitsel.data.tree.node.TreeNode[None,DataType]Class to represent a leaf node holding data in the ShnitselDB tree hierarchy.
May be inherited from to provide leaves with more advanced features like provision of delayed results for support of parallel processing or delayed loading from disc, etc.
- Parameters:
name (str | None)
data (DataType | None)
- construct_copy(children: Mapping[Hashable, None] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) DataLeaf[ResType]
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) DataLeaf[ResType]
Helper function to create a copy of this tree structure, but with potential changes to metadata or data
Parameters:#
- data: ResType | None, optional
Data to replace the current data in the copy of this node
- children: None, optional
Parameter not supported by this type of node.
- dtype: type[ResType] | TypeForm[ResType], optional
The data type of the data in the copy constructed tree.
- raises AssertionError:
If dtype is set without a new data entry being provided
- raises Returns::
- raises ———–:
- Self
A copy of this node with recursively copied children if data is not set .
- DataLeaf[ResType]
A new leaf with a new data type if data is provided.
- group_children_by(key_func=None, group_leaves_only=False, recurse=True)#
Specialization of the grouping operation for leaf nodes.
Simply returns a copy of the current node.
- class DataGroup(*, name=None, group_info=None, children=None, attrs=None, level_name=None, **kwargs)#
Bases:
Generic[DataType],shnitsel.data.tree.node.TreeNode[DataGroup[DataType]|DataLeaf[DataType],DataType]- Parameters:
- construct_copy(children: Mapping[Hashable, DataGroup[DataType] | DataLeaf[DataType]] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) DataGroup[ResType]
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) DataGroup[ResType]
Helper function to create a copy of this tree structure, but with potential changes to metadata, data or children
Parameters:#
- data: None, optional
Data setting not supported on this type of node.
- children: Mapping[Hashable, DataGroup[ResType]], optional
The mapping of children with a potentially new DataType. If not provided, will be copied from the current node’s child nodes.
- dtype: type[ResType] | TypeForm[ResType], optional
The data type of the data in the copy constructed tree.
Returns:#
Self: A copy of this node with recursively copied children if children is not set with an appropriate mapping.
- collect_data_nodes()#
Function to retrieve all nodes with data in this subtree
- property is_flat_group: bool#
Boolean flag that is true if there are no more sub-groups beneath this group, thus making the children of this group exclusively data-nodes.
- Return type:
- property subgroups: Mapping[Hashable, DataGroup[DataType]]#
- Return type:
Mapping[Hashable, DataGroup[DataType]]
- property subleaves: Mapping[Hashable, DataLeaf[DataType]]#
- Return type:
Mapping[Hashable, DataLeaf[DataType]]
- group_children_by(key_func, group_leaves_only=False)#
Specialization of the group_children_by function for group nodes, where grouping may need to be performed on subsets of their children.
- Returns:
Generally returns the same node type, potentially with updated children and an additional layer of DataGroup nodes underneath
- Return type:
Self
- Parameters:
key_func (Callable[[shnitsel.data.tree.node.TreeNode], KeyType | None])
group_leaves_only (bool)
- class GroupInfo#
Class to hold auxiliary info of a group/collection of Data in ShnitselDB
- class CompoundGroup(*, name=None, compound_info=None, group_info=None, children=None, level_name=None, attrs=None, **kwargs)#
Bases:
Generic[DataType],shnitsel.data.tree.data_group.DataGroup[DataType]DataTree node to keep track of all data associated with a common compound within the datatree
- Parameters:
name (str | None)
compound_info (CompoundInfo | None)
group_info (shnitsel.data.tree.data_group.GroupInfo | None)
children (Mapping[Hashable, shnitsel.data.tree.data_group.DataGroup[DataType] | shnitsel.data.tree.data_leaf.DataLeaf[DataType]] | None)
level_name (str | None)
attrs (Mapping[str, Any] | None)
- _compound_info: CompoundInfo#
- construct_copy(children: Mapping[Hashable, DataGroup[DataType] | DataLeaf[DataType]] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) CompoundGroup[ResType]
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) CompoundGroup[ResType]
Helper function to create a copy of this tree structure, but with potential changes to metadata, data or children
Parameters:#
- data: None, optional
Data setting not supported on this type of node.
- children: Mapping[Hashable, CompoundGroup[ResType]], optional
The mapping of children with a potentially new DataType. If not provided, will be copied from the current node’s child nodes.
- dtype: type[ResType] | TypeForm[ResType], optional
The data type of the data in the copy constructed tree.
- raises AssertionError:
If dtype is provided but children parameter not set and node has children, indicating an issue with a type update without setting the new children
- raises Returns::
- raises ———–:
Self: A copy of this node with recursively copied children if children is not set with an appropriate mapping.
- property compound_info: CompoundInfo#
Get the stored compound info of this Compound group.
- Returns:
The metadata for the compound in this compound group
- Return type:
- add_data_group(group_info, filter_func_data=None, flatten_data=False, **kwargs)#
Function to add trajectories within this compound subtree to a TrajectoryGroup of trajectories.
The group_name will be set as the name of the group in the tree. If flatten_trajectories=True all existing groups will be dissolved before filtering and the children will be turned into an ungrouped list of trajectories. The filter_func_trajectories will either be applied to only the current groups and trajectories immediately beneath this compound or to the flattened list of all child directories.
- Parameters:
group_name (str) – The name to be set for the TrajectoryGroup object
filter_func_Trajectories (Callable[[Trajectory|GroupInfo], bool] | None, optional) – A function to return true for Groups and individual trajectories that should be added to the new group. Defaults to None.
flatten_trajectories (bool, optional) – A flag whether all descendant groups should be dissolved and flattened into a list of trajectories first before applying a group. Defaults to False.
group_info (shnitsel.data.tree.data_group.GroupInfo)
filter_func_data (Callable[[shnitsel.data.tree.data_group.DataGroup | shnitsel.data.tree.data_leaf.DataLeaf], bool] | None)
- Returns:
The restructured Compound with a new added group if at least one trajectory has satisfied the filter condition.
- Return type:
- class CompoundInfo#
Class to hold identifying and auxiliary info of a compound type in ShnitselDB
- ShnitselDB#
- class ShnitselDBRoot(*, compounds=None, **kwargs)#
Bases:
Generic[DataType],shnitsel.data.tree.node.TreeNode[shnitsel.data.tree.compound.CompoundGroup[DataType],DataType]Class to use as a root for a ShnitselDB tree structure with specific Node types at different layer depths.
Will always have CompoundGroup entries on the layer underneath the root. Will only have data in DataLeaf instances. Between leaf and compound nodes, there may be arbitrary DataGroup layers to allow for hiearchical structuring.
- Parameters:
DataType (TypeVar) – A covariant template type parameter describing the kind of data that may be located in the leaves of this tree.
TreeNode[CompoundGroup[DataType] – The basic tree node type that this root node represents. Allows for sharing of functions between different levels of the tree.
DataType] – The basic tree node type that this root node represents. Allows for sharing of functions between different levels of the tree.
compounds (Mapping[Hashable, shnitsel.data.tree.compound.CompoundGroup[DataType]] | None)
- construct_copy(children: Mapping[Hashable, shnitsel.data.tree.compound.CompoundGroup[DataType]] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) ShnitselDBRoot[ResType]
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) ShnitselDBRoot[ResType]
Helper function to create a copy of this tree structure, but with potential changes to metadata, data or children
Parameters:#
- children: Mapping[Hashable, CompoundGroup[DataType]] Mapping[Hashable, CompoundGroup[ResType]], optional
The mapping of children with a potentially new DataType. If not provided, will be copied from the current node’s child nodes.
- dtype: type[ResType] | UnionType, optional
The data type of the data in the copy constructed tree.
- data: None, optional
Data setting not supported on this type of node.
Returns:#
Self: A copy of this node with recursively copied children if children is not set with an appropriate mapping.
- add_compound(name=None, compound_info=None, group_info=None, children=None, attrs=None)#
Helper function to add a new compound to this data structure without manually creating a CompoundGroup instance
A compound is provided with a name used as an identifier for the compound and optionally a more in-depth CompoundInfo object. Due to compounds also being a DataGroup, group information can optionally be set. Similarly, children and attributes for the compound can be provided.
- Parameters:
name (str | None, optional) – The compound identifier under which to register the compound, by default None, meaning it will be taken from compound_info. If no name can be extracted, a random name may be assigned.
compound_info (CompoundInfo | None, optional) – Optional data structure to provide Compound meta data, by default None.
group_info (GroupInfo | None, optional) – Optional data structure to set grouping information on the compound, by default None.
children (Mapping[Hashable, DataGroup[DataType] | DataLeaf[DataType]] | None, optional) – Optionally a mapping of children (e.g. Trajectories) to use in the CompoundGroup creation, by default None
attrs (Mapping[str, Any] | None, optional) – A mapping of keys to attribute values to set on the CompoundGroup, by default None
- Returns:
A new tree structure with the CompoundGroup inserted.
- Return type:
Self
- add_data_group(group_info, filter_func_compound=None, filter_func_data=None, flatten_compound_data=False, **kwargs)#
Function to add a group under the compound level for arbitrary compounds. The group is inserted at the top level underneath CompoundGroup nodes.
filter_func_compound can be used to only generate the group for certain compounds. This parameter should be a function that only returns True if the group should be created underneath this comound. filter_func_data can be used to select only specific groups and leaves out of the children of a compound to be part of this group. flatten_compound_data can be set to True if existing groups within a compound are supposed to be dissolved (i.e. all data leaves gathered and put directly as children of the Compound)
- Parameters:
group_info (GroupInfo) – The name and optionally additional metadata of the group to be created
filter_func_compound (Callable[[CompoundInfo], bool] | None, optional) – Filter function that should return True if the group should be created for this compound, by default None, meaning all compounds will be filtered.
filter_func_data (Callable[[DataLeaf | DataGroup], bool] | None, optional) – Filter function to determine whether a group or data leaf should be included in the new group, by default None
flatten_compound_data (bool, optional) – Flag to determine whether all trajectories under selected compounds should be ungrouped before selecting for the new group, by default False
- Returns:
A resulting ShnitselDB structure with the grouping applied.
- Return type:
Self
- set_compound_info(compound, overwrite_all=False)#
Function to set the compound information on either all unknown compounds (overwrite_all=False) or for all trajectories in the tree creating a new CompoundGroup holding all trajectories. (if overwrite_all=True).
By default, the compound info will only be applied to trajectories with unknown compounds. If all compounds are merged or a compound info is assigned that is already in use, the concerned compound subtrees will be merged before the new compound_info is applied.
- Parameters:
compound (str | CompoundInfo) – Either the compound name as a string or the compound information to apply to either the unknown compounds or all data in the tree.
overwrite_all (bool, optional) – Flag to control whether the compound group of all data should be overwritten, by default False
- Returns:
The updated database
- Return type:
Self
- property compounds: Mapping[Hashable, shnitsel.data.tree.compound.CompoundGroup[DataType]]#
The compounds held within this ShnitselDB structure.
Auxiliary function to get the children property with a more domain-specific attribute name.
- Returns:
The mapping of compound identifiers to the Compounds within this structure.
- Return type:
Mapping[Hashable, CompoundGroup[DataType]]
- group_children_by(key_func, group_leaves_only=True)#
This function creates a tree with likely a new structure having several desireable properties like groups either only having leaves or other groups underneath them and leaves within the same group having identical group keys.
Specifically the grouping will generate a tree with the following properties: - CompoundGroup layer is left mostly untouched - DataGroup layers are refactored such that all leaves (or groups) within the same group have the same key resulting from key_func - If children with different key_func results are under the same group, a new group will be created to hold children with the same key_func result. - Nodes for which key_func yields None will not be retained. - if group_leaves_only=True, existing subgroups will be kept without invoking key_func and only leaves under the same group will be partitioned
according to their key_func result.
If all children of an existing group yield the same key (NOTE: not None) result, then the group properties will be updated but the group will retain the same children.
- Parameters:
key_func (Callable[[TreeNode], KeyType]) – A function to map all TreeNodes to a certain key that allows grouping by comparison and must be hashable. Ideally a dataclass result that allows the invocation of as_dict() to set group properties after grouping.
group_leaves_only (bool, optional) – A flag whether grouping should only performed for DataLeaf type nodes, by default True.
- Returns:
A new tree with grouping performed across all DataGroup levels.
- Return type:
Self
- tree_zip(*trees: shnitsel.data.tree.data_leaf.DataLeaf, res_data_type: type[ResDataType] | typing_extensions.TypeForm[ResDataType]) shnitsel.data.tree.data_leaf.DataLeaf[ResDataType] | None#
- tree_zip(*trees: shnitsel.data.tree.data_leaf.DataLeaf, res_data_type: None = None) shnitsel.data.tree.data_leaf.DataLeaf | None
- tree_zip(*trees: shnitsel.data.tree.compound.CompoundGroup, res_data_type: type[ResDataType] | typing_extensions.TypeForm[ResDataType]) shnitsel.data.tree.compound.CompoundGroup[ResDataType] | None
- tree_zip(*trees: shnitsel.data.tree.compound.CompoundGroup, res_data_type: None = None) shnitsel.data.tree.compound.CompoundGroup | None
- tree_zip(*trees: shnitsel.data.tree.data_group.DataGroup, res_data_type: type[ResDataType] | typing_extensions.TypeForm[ResDataType]) shnitsel.data.tree.data_group.DataGroup[ResDataType] | None
- tree_zip(*trees: shnitsel.data.tree.data_group.DataGroup, res_data_type: None = None) shnitsel.data.tree.data_group.DataGroup | None
- tree_zip(*trees: shnitsel.data.tree.tree.ShnitselDBRoot, res_data_type: type[ResDataType] | typing_extensions.TypeForm[ResDataType]) shnitsel.data.tree.tree.ShnitselDBRoot[ResDataType] | None
- tree_zip(*trees: shnitsel.data.tree.tree.ShnitselDBRoot, res_data_type: None = None) shnitsel.data.tree.tree.ShnitselDBRoot | None
- tree_zip(*trees: shnitsel.data.tree.node.TreeNode, res_data_type: type[ResDataType] | typing_extensions.TypeForm[ResDataType] | None = None) shnitsel.data.tree.node.TreeNode | shnitsel.data.tree.node.TreeNode[Any, ResDataType] | None
Helper function to allow zipping of multiple trees into a single tree with tuples of data for its data.
The zipping is only performed on the data, metadata will be taken from the tree provided first. If provided with a res_data_type, the data type for the resulting tree will be set accordingly
The resulting data tuples will hold data from the various trees in order.
- Parameters:
- Returns:
The tree node of the same type as the root in the first provided tree but with an updated DataType. If no zipping was possible, because no trees were provided, None is returned.
- Return type:
- Raises:
ValueError – If trees with inconsistent structure were provided
- has_same_structure(*trees)#
Function to check whether a set of trees has the same overall structure
This means, they must have same keys to not-None children at every level and data in nodes along the same path.
- Returns:
True if all tree structures match, False otherwise.
- Return type:
- Parameters:
trees (shnitsel.data.tree.node.TreeNode)
- tree_merge(*trees: shnitsel.data.tree.tree.ShnitselDBRoot[DataType], res_data_type: type[DataType] | types.UnionType | None = None) shnitsel.data.tree.tree.ShnitselDBRoot[DataType] | None#
- tree_merge(*trees: shnitsel.data.tree.compound.CompoundGroup[DataType], res_data_type: type[DataType] | types.UnionType | None = None) shnitsel.data.tree.compound.CompoundGroup[DataType] | None
- tree_merge(*trees: shnitsel.data.tree.data_group.DataGroup[DataType], res_data_type: type[DataType] | types.UnionType | None = None) shnitsel.data.tree.data_group.DataGroup[DataType] | None
- tree_merge(*trees: shnitsel.data.tree.node.TreeNode[Any, DataType], res_data_type: type[DataType] | types.UnionType | None = None) shnitsel.data.tree.node.TreeNode[Any, DataType] | None
Helper function to merge two trees at the same level. Data leaves on the same level will all be retained. Data Group children of the roots will be merged recursively.
- Parameters:
*trees (ShnitselDBRoot[DataType] | CompoundGroup[DataType] | DataGroup[DataType] | TreeNode[Any, DataType]) – Compatible roots at the same level that represent a group of children. If inconsistent types are provided, the merge may fail.
res_data_type (type[DataType] | TypeForm[DataType] | None, optional) – An explicit indicator of which type we expect the merged tree to have, by default None
- Returns:
The merged tree of the same level as the input tree roots. Specifically, the same level as trees[0]. If there are no trees, then None is returned. If a single trees parameter is provided, then a copy of that tree is returned.
- Return type:
ShnitselDBRoot[DataType] | CompoundGroup[DataType] | DataGroup[DataType] | TreeNode[Any, DataType] | None
- Raises:
ValueError – _description_
- class TreeNode(*, name, data=None, children=None, attrs=None, level_name=None, dtype=None, **kwargs)#
Bases:
Generic[ChildType,DataType],abc.ABCBase class to model a tree structure of arbitrary data type to keep trajectory data with hierarchical structure in.
Has two type parameters to allow for explicit type checks: - ChildType: Which node types are allowed to be registered as children of this node. - DataType: What kind of data is expected within this tree if the data is not None.
- Parameters:
- classmethod _get_extended_class_name(datatypes)#
- classmethod _create_extended_node_class(datatypes)#
Create a new version of the class with added methods for the datatypes.
- classmethod __class_getitem__(args)#
- _dtype: type[DataType] | types.UnionType | None#
- _children: Mapping[Hashable, ChildType]#
- static _dtype_guess_from_children(children)#
- Parameters:
children (Mapping | None)
- Return type:
type | types.UnionType | None
- construct_copy(children: Mapping[Hashable, ChildType] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) Self#
- construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) TreeNode[NewChildType, ResType]
- construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) TreeNode[Any, ResType]
Every class inheriting from TreeNode should implement this method to create a copy of that subtree with appropriate typing or just plain up creating a copy of the subtree, if no updates are requested.
Support for changing the typing by changing child types, setting the explicit dtype or by providing a new data entry should be supported by the base class.
- Parameters:
data (ResType | None, optional) – The new data to be set in the copy of this node, by default None, which should populate it with the node’s current data
children (Mapping[str, NewChildType], optional) – A new set of children to replace the old mapping of children can be provided with this parameter. The data type can also be changed with appropriate typing here:
dtype (type[ResType] | UnionType | None, optional) – An explicit argument to set the dtype property of the new subtree, by default None.
- Returns:
Returns a new subtree with a duplicate of this node in regards to metadata at its root and updates properties as provided.
- Return type:
- __len__()#
Returns the size of this node, i.e. how many children it has.
Be aware that this means that it will return 0 for Leaf nodes that may hold data.
- Returns:
The number of children of this node
- Return type:
- __getitem__(key)#
- __setitem__(key, value)#
- property dtype: type[DataType] | types.UnionType | None#
- Return type:
type[DataType] | types.UnionType | None
- property data: DataType#
- Return type:
DataType
- property children: Mapping[Hashable, ChildType]#
- Return type:
Mapping[Hashable, ChildType]
- map_subtree(func)#
Just a helper function with telling name to apply a function to the root node of this current subtree.
Simply calls func(self).
- Parameters:
func (Callable[[Self], ResType]) – The function to apply to this node
- Returns:
The result of funct(self).
- Return type:
ResType
- abstractmethod group_children_by(key_func, group_leaves_only=False)#
Method to group nodes within this current subtree by keys as retrieved via key_func.
Can be used to group data within this tree by metadata, e.g. to separate trajectory data with different simulation settings into distinct groups.
Adds new groups into the tree structure.
- Parameters:
key_func (Callable[[TreeNode], KeyType]) – Key function that should map Any tree node that is not excluded, e.g. by setting group_leaves_only to a key value that should be a dataclass and should be equal for two nodes if and only if those nodes should eventually end up in the same group.
group_leaves_only (bool, optional) – Flag to control whether grouping should only be applied to DataLeaf nodes, by default False
- Returns:
The current node after its subtree has been grouped. If no keys could be retrieved, the result may be None.
- Return type:
Self | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: type[ResType], **kwargs) TreeNode[Any, ResType]#
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: type[ResType], **kwargs) TreeNode[Any, ResType] | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode[Any, ResType] | None
- map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode[Any, ResType]
- map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) TreeNode | None
- map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) TreeNode
Helper function to apply a mapping function to all data in leaves of this tree
The function func is applied to all DataLeaf instances with data within them. If keep_empty_branches=False is set, will truncate branches without any data in them or without any further children.
- Parameters:
func (Callable[[DataType], ResType | None]) – The mapping function to apply to data in this subtree.
keep_empty_branches (bool, optional) – Flag to control whether branches/subtrees without any data in them should be truncated, by default False to keep the same structure
dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None
*args – Positional arguments to pass to the call to func
**kwargs – Keyword-style arguments to pass to the call to func
- Returns:
The resulting node after the subtree has been mapped or None if truncation is active and the subtree has no data after mapping.
- Return type:
TreeNode[Any,ResType]|None
- map_filtered_nodes(filter_func, map_func, dtype=None)#
Map nodes using map_func() if the filter function filter_func picks them as relevant.
If the node is not picked by filter_func a copy will be created with its children being recursively mapped according to the same rule. If a node is mapped, the mapping function map_func must take care of potential mapping over children.
- Parameters:
filter_func (Callable[[TreeNode[Any, DataType]], bool]) – Filter function to apply to nodes in the current subtree of any kind. Must return True for all nodes to which map_func should be applied.
map_func (Callable[[TreeNode[Any, DataType]], TreeNode[Any, ResType]|None]) – Mapping function that transforms a selected node of a certain datatype to a consistent new data type RestType.
dtype (type[ResType] | None, optional) – Optional parameter to explicitly specify the dtype for the resulting tree, by default None.
- Returns:
TreeNode[Any, ResType] – A new subtree with the data type changed and select subtrees mapped.
None – If the node was filtered and the map function returned None
- Return type:
TreeNode[Any, ResType]|None
- filter_nodes(filter_func, recurse=True, keep_empty_branches=False)#
Function to filter the nodes in this tree and create a new tree that are ancestors of at least one accepted node.
If keep_empty_branches=True, all branches in which there are no accepted nodes, will be truncated. If filter_func does not return True, the entire subtree starting at this node, will be dropped.
- Parameters:
filter_func (Callable[..., bool]) – A filter function that should return True for Nodes that should be kept within the Tree and False for Nodes that should be kicked out together with their entire subtree.
recurse (bool, optional) – Whether to recurse the filtering into the children of kept nodes, by default True
keep_empty_branches (bool, optional) – A flag to enable truncation of branches with only empty lists of children and no data, by default False
- Returns:
Either a copy of the current subtree if it is kept or None if the subtree is omitted
- Return type:
Self | None
- add_child(child_name, child)#
Add a new child node with a preferred name in the mapping of children. If the child name is already in use, will attempt to find a collision-free alternative name.
- Parameters:
- Raises:
OverflowError – If the attempts to find a new collision-free name have exceeded 1000.:
- Returns:
Self
- Return type:
The new instance of a subtree
- assign_children(new_children)#
Helper function to assign new children to this node without changing the child or data type of the tree
Unlike calling construct_copy() directly, this will retain already existing children under this node if new_children does not overwrite all keys in this node
- Parameters:
new_children (Mapping[Hashable, ChildType]) – The mapping of additional children to be appended to this node’s list of children.
- Returns:
A copy of this node but with potentially more or different child nodes.
- Return type:
Self
- is_level(target_level)#
Check whether we are at a certain level in the ShnitselDB structure
- collect_data(with_path: typing_extensions.Literal[True]) Iterator[tuple[str, DataType]]#
- collect_data(with_path: typing_extensions.Literal[False] = False) Iterator[DataType]
Function to retrieve all data entries in the tree underneath this node.
Helpful for aggregating across all entries in a subtree without the need for full hierarchical information.
- Parameters:
with_path (bool, default=False) – Flag to obtain an iterable over the pairs of paths and data instead.
- Yields:
Iterator[Iterable[DataType]] – An iterator over all the data entries in this subtree.
Iterator[tuple[str, DataType]] – An iterator over all the data entries in this subtree paired with their paths in the tree.
- apply_data_attributes(properties)#
- map_flat_group_data(map_func)#
Helper function to apply a mapping function to all flat group nodes.
Will only apply the mapping function to nodes of type DataGroup and only those who have exclusively DataLeaf children.
- Parameters:
map_func (Callable[[Iterable[DataType]], ResType | None]) – Function mapping the data in the flat groups to a new result type
- Returns:
A new subtree structure, which will hold leaves with ResType data underneath each mapped group.
- Return type:
Self | TreeNode[Any, ResType]
- group_data_by_metadata()#
Helper function to allow for grouping of data within the tree by the metadata extracted from Trajectories.
Should only be called on trees where DataType=Trajectory or DataType=Frames or subtypes thereof. Will fail due to an attribute error or yield an empty tree otherwise.
- Returns:
A tree where leaves are grouped to have similar metadata and only leaves with the same metadata are within the same gorup.
- Return type:
Self
- property as_stacked: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType#
- Return type:
shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType
- to_stacked(only_direct_children=False)#
Stack the trajectories in a subtree into a multi-trajetctory dataset.
The resulting dataset has a new frame dimension along which we can iterate through all individual frames of all trajectories.
- Parameters:
only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.
- Returns:
MultiSeriesStacked – The resulting multi-trajectory dataset stacked along a frame dimension
DataType – If it is an xarray.DataArray tree that we are concatenating.
- Return type:
shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType
- property as_layered: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered#
- to_layered(only_direct_children=False)#
Lazer the trajectories in a subtree into a multi-trajectory dataset.
The resulting dataset has a new trajectorz dimension along which we can iterate through all individual frames of all trajectories.
- Parameters:
only_direct_children (bool, optional) – Whether to only gather trajectories from direct children of this subtree.
- Returns:
The resulting multi-trajectory dataset layered along a trajectory dimension
- Return type:
- abstractmethod sel(indexers=None, method=None, tolerance=None, drop=False, **indexers_kwargs)#
Returns a new dataset with each array indexed by tick labels along the specified dimension(s).
In contrast to Dataset.isel, indexers for this method should use labels instead of integers.
Under the hood, this method is powered by using pandas’s powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing.
It also means this method uses pandas’s (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., ‘2000-01’ to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing.
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
method ({None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional) –
Method to use for inexact matches:
None (default): only exact matches
pad / ffill: propagate last valid index value forward
backfill / bfill: propagate next valid index value backward
nearest: use nearest valid index value
tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
abs(index[indexer] - target) <= tolerance.drop (bool, optional) – If
drop=True, drop coordinates variables in indexers instead of making them scalar.**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
obj – A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array’s data will be a view of the array’s data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy.
- Return type:
Dataset
See also
Dataset.iselDataArray.sel- xarray-tutorial:intermediate/indexing/indexing
Tutorial material on indexing with Xarray objects
- xarray-tutorial:fundamentals/02.1_indexing_Basic
Tutorial material on basics of indexing
- abstractmethod isel(indexers=None, drop=False, missing_dims='raise', **indexers_kwargs)#
Returns a new tree indexed along dimensions compound, group or trajectory and with data in leaves of the tree indexed along the remaining specified dimension(s) if the leaves support .isel() operations.
Internally, it filters data with their own .isel() functions and performs some additional filtering specific to the tree structure
- Parameters:
indexers (dict, optional) – A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See Indexing and selecting data for the details. One of indexers or indexers_kwargs must be provided.
drop (bool, default: False) – If
drop=True, drop coordinates variables indexed by integers instead of making them scalar.missing_dims ({"raise", "warn", "ignore"}, default: "raise") – What to do if dimensions that should be selected from are not present in the Dataset: - “raise”: raise an exception - “warn”: raise a warning, and ignore the missing dimensions - “ignore”: ignore the missing dimensions
**indexers_kwargs ({dim: indexer, ...}, optional) – The keyword arguments form of
indexers. One of indexers or indexers_kwargs must be provided.
- Returns:
obj – A new tree with the same contents as this tree, except each data entry is indexed by the appropriate indexers and subtrees are filtered by the choices in tree-specific dimensions. The logic for selection on the leaf data entries is specific to the type of data in the leaf.
- Return type:
TreeNode[ChildType, DataType]
Examples
# TODO: FIXME: Provide better tree selection example.
>>> tree = xr.Dataset( ... { ... "math_scores": ( ... ["student", "test"], ... [[90, 85, 92], [78, 80, 85], [95, 92, 98]], ... ), ... "english_scores": ( ... ["student", "test"], ... [[88, 90, 92], [75, 82, 79], [93, 96, 91]], ... ), ... }, ... coords={ ... "student": ["Alice", "Bob", "Charlie"], ... "test": ["Test 1", "Test 2", "Test 3"], ... }, ... )
# A specific element from the dataset is selected
>>> dataset.isel(student=1, test=0) <xarray.Dataset> Size: 68B Dimensions: () Coordinates: student <U7 28B 'Bob' test <U6 24B 'Test 1' Data variables: math_scores int64 8B 78 english_scores int64 8B 75
# Indexing with a slice using isel
>>> slice_of_data = dataset.isel(student=slice(0, 2), test=slice(0, 2)) >>> slice_of_data <xarray.Dataset> Size: 168B Dimensions: (student: 2, test: 2) Coordinates: * student (student) <U7 56B 'Alice' 'Bob' * test (test) <U6 48B 'Test 1' 'Test 2' Data variables: math_scores (student, test) int64 32B 90 85 78 80 english_scores (student, test) int64 32B 88 90 75 82
# Indexing using a sequence of keys.
See also
Dataset.iselTreeNode.sel
- __str__()#
A basic representation of this node.
Only contains rudimentary information about this node. Use repr() for a more extensive representation.
- Returns:
A string representation with minimal information.
- Return type:
- __repr__()#
A simple representation of the data and structure of this subtree.
_extended_summary_
- Returns:
A string representation with more extensive information than that returned by __str__()
- Return type:
- xarray_datatree_to_shnitsel_tree(node, dtype=None)#
Helper function to invert the operation of tree_to_xarray_datastree and deserialize the shnitsel tree/ShnitselDB from a stored xarray DataTree:
- Parameters:
node (xr.DataTree) – The root node of a xarray subtree. will convert this subtree recursively.
dtype (type[DataType] | TypeForm[DataType], optional) – Optional argument to specify the desired target type of data in the shnitsel tree structure.
- Returns:
The converted type or None if the tree could not be converted.
- Return type:
ShnitselDBRoot | CompoundGroup | DataGroup | DataLeaf | None
- tree_to_xarray_datatree(node)#
Helper function to convert a ShnitselDB tree format to xarray.DataTree format so that we can use the xarray functions to write a netcdf file.
Will recursively convert the tree from the current node starting from the leaves upwards. If the type of the node is not supported or the datatype in leaves is not supported for being stored via the xarray functions, the conversion will fail.
- Parameters:
node (TreeNode[Any, DataType]) – The root node of a subtree to be converted to a xr.DataTree structure.
- Returns:
Either the converted tree or None if this subtree is not supported.
- Return type:
xr.DataTree | None
- complete_shnitsel_tree#