shnitsel.data.tree.node ======================= .. py:module:: shnitsel.data.tree.node Attributes ---------- .. autoapisummary:: shnitsel.data.tree.node.ChildType shnitsel.data.tree.node.DataType shnitsel.data.tree.node.NewDataType shnitsel.data.tree.node.NewChildType shnitsel.data.tree.node.ResType shnitsel.data.tree.node.KeyType shnitsel.data.tree.node.T shnitsel.data.tree.node._class_cache Classes ------- .. autoapisummary:: shnitsel.data.tree.node.TreeNode Functions --------- .. autoapisummary:: shnitsel.data.tree.node._trajectory_key_func Module Contents --------------- .. py:data:: ChildType .. py:data:: DataType .. py:data:: NewDataType .. py:data:: NewChildType .. py:data:: ResType .. py:data:: KeyType .. py:data:: T .. py:data:: _class_cache .. py:class:: TreeNode(*, name, data = None, children = None, attrs = None, level_name = None, dtype = None, **kwargs) Bases: :py:obj:`Generic`\ [\ :py:obj:`ChildType`\ , :py:obj:`DataType`\ ], :py:obj:`abc.ABC` Base class to model a tree structure of arbitrary data type to keep trajectory data with hierarchical structure in. Has two type parameters to allow for explicit type checks: - `ChildType`: Which node types are allowed to be registered as children of this node. - `DataType`: What kind of data is expected within this tree if the data is not None. .. py:method:: _get_extended_class_name(datatypes) :classmethod: .. py:method:: _create_extended_node_class(datatypes) :classmethod: Create a new version of the class with added methods for the datatypes. .. py:method:: __class_getitem__(args) :classmethod: .. py:attribute:: _name :type: str | None .. py:attribute:: _dtype :type: type[DataType] | types.UnionType | None .. py:attribute:: _data :type: DataType | None .. py:attribute:: _children :type: Mapping[Hashable, ChildType] .. py:attribute:: _attrs :type: Mapping[str, Any] .. py:attribute:: _parent :type: Self | None .. py:attribute:: _level_name :type: str | None .. py:method:: _dtype_guess_from_children(children) :staticmethod: .. py:method:: construct_copy(children: Mapping[Hashable, ChildType] | None = None, dtype: None = None, data: DataType | None = None, **kwargs) -> Self construct_copy(children: Mapping[Hashable, NewChildType] | None = None, dtype: type[ResType] | types.UnionType | None = None, data: None = None, **kwargs) -> TreeNode[NewChildType, ResType] construct_copy(children: None = None, dtype: type[ResType] | types.UnionType | None = None, data: ResType | None = None, **kwargs) -> TreeNode[Any, ResType] Every class inheriting from TreeNode should implement this method to create a copy of that subtree with appropriate typing or just plain up creating a copy of the subtree, if no updates are requested. Support for changing the typing by changing child types, setting the explicit `dtype` or by providing a new `data` entry should be supported by the base class. :param data: The new data to be set in the copy of this node, by default None, which should populate it with the node's current data :type data: ResType | None, optional :param children: A new set of children to replace the old mapping of children can be provided with this parameter. The data type can also be changed with appropriate typing here: :type children: Mapping[str, NewChildType], optional :param dtype: An explicit argument to set the `dtype` property of the new subtree, by default None. :type dtype: type[ResType] | UnionType | None, optional :returns: Returns a new subtree with a duplicate of this node in regards to metadata at its root and updates properties as provided. :rtype: Self | TreeNode[TreeNode[Any, RestType]|None, ResType] .. py:property:: path :type: str .. py:method:: __len__() Returns the `size` of this node, i.e. how many children it has. Be aware that this means that it will return 0 for Leaf nodes that may hold data. :returns: The number of children of this node :rtype: int .. py:method:: __contains__(value) .. py:method:: __getitem__(key) .. py:method:: __setitem__(key, value) .. py:property:: is_leaf :type: bool .. py:property:: has_data :type: bool .. py:property:: dtype :type: type[DataType] | types.UnionType | None .. py:property:: data :type: DataType .. py:property:: children :type: Mapping[Hashable, ChildType] .. py:property:: root :type: TreeNode[Any, DataType] .. py:property:: attrs :type: Mapping[str, Any] .. py:property:: name :type: str .. py:method:: map_subtree(func) Just a helper function with telling name to apply a function to the root node of this current subtree. Simply calls `func(self)`. :param func: The function to apply to this node :type func: Callable[[Self], ResType] :returns: The result of `funct(self)`. :rtype: ResType .. py:method:: group_children_by(key_func, group_leaves_only = False) :abstractmethod: Method to group nodes within this current subtree by keys as retrieved via `key_func`. Can be used to group data within this tree by metadata, e.g. to separate trajectory data with different simulation settings into distinct groups. Adds new groups into the tree structure. :param key_func: Key function that should map Any tree node that is not excluded, e.g. by setting `group_leaves_only` to a key value that should be a dataclass and should be equal for two nodes if and only if those nodes should eventually end up in the same group. :type key_func: Callable[[TreeNode], KeyType] :param group_leaves_only: Flag to control whether grouping should only be applied to `DataLeaf` nodes, by default False :type group_leaves_only: bool, optional :returns: The current node after its subtree has been grouped. If no keys could be retrieved, the result may be `None`. :rtype: Self | None .. py:method:: map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: type[ResType], **kwargs) -> TreeNode[Any,ResType] map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: type[ResType], **kwargs) -> TreeNode[Any,ResType]|None map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) -> TreeNode[Any,ResType]|None map_data(func: Callable[Ellipsis, ResType | None] | Callable[Ellipsis, ResType], *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) -> TreeNode[Any,ResType] map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[False], dtype: None = None, **kwargs) -> TreeNode|None map_data(func: Callable, *args, keep_empty_branches: typing_extensions.Literal[True] = True, dtype: None = None, **kwargs) -> TreeNode Helper function to apply a mapping function to all data in leaves of this tree The function `func` is applied to all `DataLeaf` instances with `data` within them. If `keep_empty_branches=False` is set, will truncate branches without any data in them or without any further children. :param func: The mapping function to apply to data in this subtree. :type func: Callable[[DataType], ResType | None] :param keep_empty_branches: Flag to control whether branches/subtrees without any data in them should be truncated, by default False to keep the same structure :type keep_empty_branches: bool, optional :param dtype: Optional parameter to explicitly specify the `dtype` for the resulting tree, by default None :type dtype: type[ResType] | None, optional :param \*args: Positional arguments to pass to the call to `func` :param \*\*kwargs: Keyword-style arguments to pass to the call to `func` :returns: The resulting node after the subtree has been mapped or None if truncation is active and the subtree has no data after mapping. :rtype: TreeNode[Any,ResType]|None .. py:method:: map_filtered_nodes(filter_func, map_func, dtype = None) Map nodes using `map_func()` if the filter function `filter_func` picks them as relevant. If the node is not picked by `filter_func` a copy will be created with its children being recursively mapped according to the same rule. If a node is mapped, the mapping function `map_func` must take care of potential mapping over children. :param filter_func: Filter function to apply to nodes in the current subtree of any kind. Must return `True` for all nodes to which `map_func` should be applied. :type filter_func: Callable[[TreeNode[Any, DataType]], bool] :param map_func: Mapping function that transforms a selected node of a certain datatype to a consistent new data type `RestType`. :type map_func: Callable[[TreeNode[Any, DataType]], TreeNode[Any, ResType]|None] :param dtype: Optional parameter to explicitly specify the `dtype` for the resulting tree, by default None. :type dtype: type[ResType] | None, optional :returns: * *TreeNode[Any, ResType]* -- A new subtree with the data type changed and select subtrees mapped. * *None* -- If the node was filtered and the map function returned None .. py:method:: filter_nodes(filter_func, recurse = True, keep_empty_branches = False) Function to filter the nodes in this tree and create a new tree that are ancestors of at least one accepted node. If `keep_empty_branches=True`, all branches in which there are no accepted nodes, will be truncated. If `filter_func` does not return `True`, the entire subtree starting at this node, will be dropped. :param filter_func: A filter function that should return True for Nodes that should be kept within the Tree and `False` for Nodes that should be kicked out together with their entire subtree. :type filter_func: Callable[..., bool] :param recurse: Whether to recurse the filtering into the children of kept nodes, by default True :type recurse: bool, optional :param keep_empty_branches: A flag to enable truncation of branches with only empty lists of children and no data, by default False :type keep_empty_branches: bool, optional :returns: Either a copy of the current subtree if it is kept or None if the subtree is omitted :rtype: Self | None .. py:method:: add_child(child_name, child) Add a new child node with a preferred name in the mapping of children. If the child name is already in use, will attempt to find a collision-free alternative name. :param child_name (str | None): To avoid overriding, a different name will be chosen if the name is in use. :type child_name (str | None): The preferred name under which the child should be registered. :param child (ChildType): :type child (ChildType): Object to register as the child-subtree :raises OverflowError: If the attempts to find a new collision-free name have exceeded 1000.: :returns: **Self** :rtype: The new instance of a subtree .. py:method:: assign_children(new_children) Helper function to assign new children to this node without changing the child or data type of the tree Unlike calling `construct_copy()` directly, this will retain already existing children under this node if `new_children` does not overwrite all keys in this node :param new_children: The mapping of *additional* children to be appended to this node's list of children. :type new_children: Mapping[Hashable, ChildType] :returns: A copy of this node but with potentially more or different child nodes. :rtype: Self .. py:method:: is_level(target_level) Check whether we are at a certain level in the ShnitselDB structure :param target_level: Desired level(s) to check for and accept as the target level. :type target_level: str | Iterable[str] :returns: True if the current node is of the required level or one of the required levels :rtype: bool .. py:method:: collect_data(with_path: typing_extensions.Literal[True]) -> Iterator[tuple[str, DataType]] collect_data(with_path: typing_extensions.Literal[False] = False) -> Iterator[DataType] Function to retrieve all data entries in the tree underneath this node. Helpful for aggregating across all entries in a subtree without the need for full hierarchical information. :param with_path: Flag to obtain an iterable over the pairs of paths and data instead. :type with_path: bool, default=False :Yields: * *Iterator[Iterable[DataType]]* -- An iterator over all the data entries in this subtree. * *Iterator[tuple[str, DataType]]* -- An iterator over all the data entries in this subtree paired with their paths in the tree. .. py:method:: apply_data_attributes(properties) :param properties: The attributes to set with their respective values. :type properties: dict :returns: The subtree after the update :rtype: Self | TreeNode[Any, DataType] .. py:method:: map_flat_group_data(map_func) Helper function to apply a mapping function to all flat group nodes. Will only apply the mapping function to nodes of type `DataGroup` and only those who have exclusively `DataLeaf` children. :param map_func: Function mapping the data in the flat groups to a new result type :type map_func: Callable[[Iterable[DataType]], ResType | None] :returns: A new subtree structure, which will hold leaves with ResType data underneath each mapped group. :rtype: Self | TreeNode[Any, ResType] .. py:method:: group_data_by_metadata() Helper function to allow for grouping of data within the tree by the metadata extracted from Trajectories. Should only be called on trees where `DataType=Trajectory` or `DataType=Frames` or subtypes thereof. Will fail due to an attribute error or yield an empty tree otherwise. :returns: A tree where leaves are grouped to have similar metadata and only leaves with the same metadata are within the same gorup. :rtype: Self .. py:property:: as_stacked :type: shnitsel.data.dataset_containers.multi_stacked.MultiSeriesStacked | DataType .. py:method:: to_stacked(only_direct_children = False) Stack the trajectories in a subtree into a multi-trajetctory dataset. The resulting dataset has a new `frame` dimension along which we can iterate through all individual frames of all trajectories. :param only_direct_children: Whether to only gather trajectories from direct children of this subtree. :type only_direct_children: bool, optional :returns: * *MultiSeriesStacked* -- The resulting multi-trajectory dataset stacked along a `frame` dimension * *DataType* -- If it is an xarray.DataArray tree that we are concatenating. .. py:property:: as_layered :type: shnitsel.data.dataset_containers.multi_layered.MultiSeriesLayered .. py:method:: to_layered(only_direct_children = False) Lazer the trajectories in a subtree into a multi-trajectory dataset. The resulting dataset has a new `trajectorz` dimension along which we can iterate through all individual frames of all trajectories. :param only_direct_children: Whether to only gather trajectories from direct children of this subtree. :type only_direct_children: bool, optional :returns: The resulting multi-trajectory dataset layered along a `trajectory` dimension :rtype: MultiSeriesLayered .. py:method:: sel(indexers = None, method = None, tolerance = None, drop = False, **indexers_kwargs) :abstractmethod: Returns a new dataset with each array indexed by tick labels along the specified dimension(s). In contrast to `Dataset.isel`, indexers for this method should use labels instead of integers. Under the hood, this method is powered by using pandas's powerful Index objects. This makes label based indexing essentially just as fast as using integer indexing. It also means this method uses pandas's (well documented) logic for indexing. This means you can use string shortcuts for datetime indexes (e.g., '2000-01' to select all values in January 2000). It also means that slices are treated as inclusive of both the start and stop values, unlike normal Python indexing. :param indexers: A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See :ref:`indexing` for the details. One of indexers or indexers_kwargs must be provided. :type indexers: dict, optional :param method: Method to use for inexact matches: * None (default): only exact matches * pad / ffill: propagate last valid index value forward * backfill / bfill: propagate next valid index value backward * nearest: use nearest valid index value :type method: {None, "nearest", "pad", "ffill", "backfill", "bfill"}, optional :param tolerance: Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation ``abs(index[indexer] - target) <= tolerance``. :type tolerance: optional :param drop: If ``drop=True``, drop coordinates variables in `indexers` instead of making them scalar. :type drop: bool, optional :param \*\*indexers_kwargs: The keyword arguments form of ``indexers``. One of indexers or indexers_kwargs must be provided. :type \*\*indexers_kwargs: {dim: indexer, ...}, optional :returns: **obj** -- A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. If indexer DataArrays have coordinates that do not conflict with this object, then these coordinates will be attached. In general, each array's data will be a view of the array's data in this dataset, unless vectorized indexing was triggered by using an array indexer, in which case the data will be a copy. :rtype: Dataset .. seealso:: :func:`Dataset.isel ` :func:`DataArray.sel ` :doc:`xarray-tutorial:intermediate/indexing/indexing` Tutorial material on indexing with Xarray objects :doc:`xarray-tutorial:fundamentals/02.1_indexing_Basic` Tutorial material on basics of indexing .. py:method:: isel(indexers = None, drop = False, missing_dims = 'raise', **indexers_kwargs) :abstractmethod: Returns a new tree indexed along dimensions `compound`, `group` or `trajectory` and with data in leaves of the tree indexed along the remaining specified dimension(s) if the leaves support `.isel()` operations. Internally, it filters data with their own `.isel()` functions and performs some additional filtering specific to the tree structure :param indexers: A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray. If DataArrays are passed as indexers, xarray-style indexing will be carried out. See :ref:`indexing` for the details. One of indexers or indexers_kwargs must be provided. :type indexers: dict, optional :param drop: If ``drop=True``, drop coordinates variables indexed by integers instead of making them scalar. :type drop: bool, default: False :param missing_dims: What to do if dimensions that should be selected from are not present in the Dataset: - "raise": raise an exception - "warn": raise a warning, and ignore the missing dimensions - "ignore": ignore the missing dimensions :type missing_dims: {"raise", "warn", "ignore"}, default: "raise" :param \*\*indexers_kwargs: The keyword arguments form of ``indexers``. One of indexers or indexers_kwargs must be provided. :type \*\*indexers_kwargs: {dim: indexer, ...}, optional :returns: **obj** -- A new tree with the same contents as this tree, except each data entry is indexed by the appropriate indexers and subtrees are filtered by the choices in tree-specific dimensions. The logic for selection on the leaf data entries is specific to the type of data in the leaf. :rtype: TreeNode[ChildType, DataType] .. rubric:: Examples # TODO: FIXME: Provide better tree selection example. >>> tree = xr.Dataset( ... { ... "math_scores": ( ... ["student", "test"], ... [[90, 85, 92], [78, 80, 85], [95, 92, 98]], ... ), ... "english_scores": ( ... ["student", "test"], ... [[88, 90, 92], [75, 82, 79], [93, 96, 91]], ... ), ... }, ... coords={ ... "student": ["Alice", "Bob", "Charlie"], ... "test": ["Test 1", "Test 2", "Test 3"], ... }, ... ) # A specific element from the dataset is selected >>> dataset.isel(student=1, test=0) Size: 68B Dimensions: () Coordinates: student >> slice_of_data = dataset.isel(student=slice(0, 2), test=slice(0, 2)) >>> slice_of_data Size: 168B Dimensions: (student: 2, test: 2) Coordinates: * student (student) ` :func:`TreeNode.sel ` .. py:method:: __str__() A basic representation of this node. Only contains rudimentary information about this node. Use `repr()` for a more extensive representation. :returns: A string representation with minimal information. :rtype: str .. py:method:: __repr__() A simple representation of the data and structure of this subtree. _extended_summary_ :returns: A string representation with more extensive information than that returned by `__str__()` :rtype: str .. py:method:: _repr_html_() Obtain an html representation of this subtree. Currently generates a tabular representation of the subtree. :returns: A html string representing the data in this subtree. :rtype: str .. py:function:: _trajectory_key_func(node) Helper function to extract trajectory metadata of leaf nodes for trees with appropriate data types. If applied to other nodes may yield a `None` key or just their `name` attribute as a `str`. :param node: The node to extract the `TrajectoryGroupingMetadata` metadata from. See `Trajectory.get_grouping_metadata()` for creation of the meta data instance. :type node: TreeNode :returns: The key to use for the grouping of this node. :rtype: None | str | TrajectoryGroupingMetadata