shnitsel.analyze.pca#
Attributes#
Classes#
Class to hold the results of a PCA analysis. |
Functions#
|
Get PCA projected data and a mask to provide information on which of the data points represent hopping points. |
|
Function to perform a PCA decomposition on the data of various origins and formats. |
|
Wrapper function to directly apply the PCA decomposition to the values in a dataarray. |
Module Contents#
- OriginType#
- ResultType#
- DataType#
- class PCAResult(pca_inputs, pca_dimension, pca_pipeline, pca_object, pca_projected_inputs)#
Bases:
Generic[OriginType,ResultType]Class to hold the results of a PCA analysis.
Also retains input data as well as corresponding results of the PCA decomposition. Input and output types are parametrized to allow for tree structures to be accurately represented.
Provides accessors for all result meta data as well as the method project_array(data_array) to project another array of appropriate shape with dimension pca_mapped_dimension to the PCA principal components.
- Parameters:
OriginType – The type of the original intput data. Should either be xr.DataArray for simple types, meaning we were provided a feature array or a flat DataGroup with xr.DataArrays in its leaves for tree types.
ResultType – Matching structure to OriginType but with the projected PCA decomposed input data as data within it. Either an xr.DataArray or a DataGroup same as for OriginType.
pca_inputs (OriginType)
pca_dimension (Hashable)
pca_pipeline (sklearn.pipeline.Pipeline)
pca_object (sklearn.decomposition.PCA)
pca_projected_inputs (ResultType)
- _pca_inputs: OriginType#
- _pca_pipeline: sklearn.pipeline.Pipeline#
- _pca_dimension: Hashable#
- _pca_components: xarray.DataArray#
- _pca_object: sklearn.decomposition.PCA#
- _pca_inputs_projected: ResultType#
- property inputs: OriginType#
- Return type:
OriginType
- property fitted_pca_object: sklearn.decomposition.PCA#
- Return type:
sklearn.decomposition.PCA
- property pca_mapped_dimension: Hashable#
- Return type:
Hashable
- property pca_pipeline: sklearn.pipeline.Pipeline#
- Return type:
sklearn.pipeline.Pipeline
- property principal_components: xarray.DataArray#
- Return type:
- property loadings: xarray.DataArray#
- Return type:
- property projected_inputs: ResultType#
- Return type:
ResultType
- property results: ResultType#
- Return type:
ResultType
- get_most_significant_loadings(top_n_per=5, top_n_total=5)#
Function to retrieve the most significant loadings in the PCA result for each individual component and in total.
You can configure the amount of
- Parameters:
- Returns:
First the mapping of each PC to the array holding the data of all their most significant loadings. Second the overall most significant loadings across all components.
- Return type:
tuple[Mapping[Hashable, xr.DataArray], xr.DataArray]
- explain_loadings(top_n_per=5, top_n_total=5)#
Generate a textual explanation of the top influential loadings in the PCA result.
Tries to put the results of get_most_significant_loadings() into a textual form.
- Parameters:
- Returns:
A text describing the results of the principal components analysis.
- Return type:
- project_array(other_da)#
- Parameters:
other_da (xarray.DataArray)
- Return type:
- static get_extra_coords_for_loadings(data, dim)#
- Parameters:
data (xarray.DataArray)
dim (Hashable)
- Return type:
Mapping[Hashable, xarray.DataArray]
- pca_and_hops(frames: shnitsel.data.tree.node.TreeNode[Any, shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset], structure_selection: shnitsel.filtering.structure_selection.StructureSelection | shnitsel.filtering.structure_selection.StructureSelectionDescriptor | None = None, center_mean: bool = False, n_components: int = 2) shnitsel.data.tree.node.TreeNode[Any, tuple[PCAResult, xarray.DataArray]]#
- pca_and_hops(frames: shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset, structure_selection: shnitsel.filtering.structure_selection.StructureSelection | shnitsel.filtering.structure_selection.StructureSelectionDescriptor | None = None, center_mean: bool = False, n_components: int = 2) tuple[PCAResult, xarray.DataArray]
Get PCA projected data and a mask to provide information on which of the data points represent hopping points.
- Parameters:
frames (xr.Dataset | ShnitselDataset | TreeNode[Any, ShnitselDataset | xr.Dataset]) – A Dataset (or tree of those) containing ‘atXYZ’ and ‘astate’ variables
structure_selection (StructureSelection | StructureSelectionDescriptor, optional) – An optional selection of features to calculate and base the PCA fitting on. If not provided, will calculate a PCA for full pairwise distances.
center_mean (bool) – Center mean data before pca if True, by default: False.
n_components (int, optional) – The number of principal components to return, by default 2, by default 2
- Returns:
A tuple of the following two parts: - pca_res
The object result of the call to pca() holding all results of the pca analysis (see documentation of pca()).
- hopping_point_masks
The mask of the hopping point events. Can be used to only extract the hopping point PCA results from the projected input result in pca_res.
- Return type:
- pca(data: shnitsel.data.tree.node.TreeNode[Any, shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset], structure_selection: shnitsel.filtering.structure_selection.StructureSelection | shnitsel.filtering.structure_selection.StructureSelectionDescriptor | None = None, dim: None = None, n_components: int = 2, center_mean: bool = False) shnitsel.data.tree.node.TreeNode[Any, PCAResult[shnitsel.data.tree.data_group.DataGroup[xarray.DataArray], shnitsel.data.tree.data_group.DataGroup[xarray.DataArray]]]#
- pca(data: shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset | xarray.DataArray, structure_selection: shnitsel.filtering.structure_selection.StructureSelection | shnitsel.filtering.structure_selection.StructureSelectionDescriptor | None = None, dim: None = None, n_components: int = 2, center_mean: bool = False) PCAResult
- pca(data: shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset | shnitsel.data.tree.node.TreeNode[Any, shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset], structure_selection: shnitsel.filtering.structure_selection.StructureSelection | shnitsel.filtering.structure_selection.StructureSelectionDescriptor | None = None, dim: None = None, n_components: int = 2, center_mean: bool = False) PCAResult | shnitsel.data.tree.node.TreeNode[Any, PCAResult[shnitsel.data.tree.data_group.DataGroup[xarray.DataArray], shnitsel.data.tree.data_group.DataGroup[xarray.DataArray]]]
- pca(data: xarray.DataArray, structure_selection: None = None, dim: Hashable | None = None, n_components: int = 2, center_mean: bool = False) PCAResult[xarray.DataArray, xarray.DataArray]
Function to perform a PCA decomposition on the data of various origins and formats.
Can accept either full trajectory data in types of Frames, Trajectory or ShnitselDB hierarchical formats or as a raw xr.Dataset. Alternatively, the dataarray
- Parameters:
da (xr.DataArray) – A DataArray with at least a dimension with a name matching dim dtype should be integer or floating with no
nanorinfentriesstructure_selection (StructureSelection | StructureSelectionDescriptor, optional) – Optional selection of geometric features to include in the PCA. If not provided, will fall back to pairwise distances.
dim – The name of the array-dimension to reduce (i.e. the axis along which different features lie)
n_components (int, optional) – The number of principal components to return, by default 2
center_mean (bool, optional) – Flag to center data before being passed to the PCA if set to True, by default False.
- Returns:
PCAResult[xr.DataArray, xr.DataArray] – The full information obtained by the fitting of the result. Contains the inputs for the PCA result, the principal components, the mapped values for the inputs, the full pipeline to apply the PCA transformation again to other data.
The mapped inputs are a DataArray with the same dimensions as
da, except for the dimension indicated by dim, which is replaced by a dimensionPCof sizen_components.result.principal_componentsholds the fitted principal components.result.projected_inputsprovides the PCA projection result when applied to the inputs.ShnitselDB[PCAResult[DataGroup[xr.DataArray], DataGroup[xr.DataArray]]] – The hierarchical structure of PCA results, where each flat group is used for a PCA analysis.
Examples
———
>>> pca_results1 = pca(data1)
>>> pca_results1.projected_inputs # See the loadings
>>> pca_results2 = pca_results1.project_array(data2)
- pca_direct(data, dim, n_components=2)#
Wrapper function to directly apply the PCA decomposition to the values in a dataarray.
Contrary to the pca() function, the features for the pca are not derived from the first data parameter
- Parameters:
data (xr.DataArray) – A DataArray with at least a dimension with a name matching dim
dim (Hashable) – The name of the array-dimension to reduce (i.e. the axis along which different features lie)
n_components (int, optional) – The number of principal components to return, by default 2
- Returns:
PCAResult – The full information obtained by the fitting of the result. Contains the inputs for the PCA result, the principal components, the mapped values for the inputs, the full pipeline to apply the PCA transformation again to other data.
The mapped inputs are a DataArray with the same dimensions as
da, except for the dimension indicated by dim, which is replaced by a dimensionPCof sizen_components.Examples
———
>>> pca_results1 = pca(data1, ‘features’)
>>> pca_results1.projected_inputs # See the loadings
>>> pca_results2 = pca_results1.project_array(data2)
- Return type:
- principal_component_analysis#
- PCA#