shnitsel.analyze.pca

Attributes

principal_component_analysis

PCA

Functions

pca_and_hops(frames, mean)

Get PCA points and info on which of them represent hops

pairwise_dists_pca(atXYZ[, mean, return_pca_object])

PCA-reduced pairwise interatomic distances

pca(da, dim[, n_components, return_pca_object])

xarray-oriented wrapper around scikit-learn's PCA

Module Contents

pca_and_hops(frames, mean)

Get PCA points and info on which of them represent hops

Parameters:
  • frames (xarray.Dataset) – A Dataset containing ‘atXYZ’ and ‘astate’ variables

  • mean (bool) – mean center data before pca if true

Returns:

  • pca_res – The PCA-reduced pairwise interatomic distances

  • hops_pca_coordspca_res filtered by hops, to facilitate marking hops when plotting

Return type:

tuple[xarray.DataArray, xarray.DataArray]

pairwise_dists_pca(atXYZ, mean=False, return_pca_object=False, **kwargs)

PCA-reduced pairwise interatomic distances

Parameters:
  • atXYZ (shnitsel.core.typedefs.AtXYZ) – A DataArray containing the atomic positions; must have a dimension called ‘atom’

  • mean (bool)

Returns:

  • A DataArray with the same dimensions as atXYZ, except for the ‘atom’

  • dimension, which is replaced by a dimension ‘PC’ containing the principal

  • components (by default 2)

Return type:

xarray.DataArray

pca(da, dim, n_components=2, return_pca_object=False)

xarray-oriented wrapper around scikit-learn’s PCA

Parameters:
  • da (xarray.DataArray) – A DataArray with at least a dimension with a name matching dim

  • dim (str) – The name of the array-dimension to reduce (i.e. the axis along which different features lie)

  • n_components (int) – The number of principle components to return, by default 2

  • optional – The number of principle components to return, by default 2

  • return_pca_object (bool) – Whether to return the scikit-learn PCA object as well as the transformed data, by default False

  • optional – Whether to return the scikit-learn PCA object as well as the transformed data, by default False

Returns:

  • pca_res – A DataArray with the same dimensions as da, except for the dimension indicated by dim, which is replaced by a dimension PC of size n_components If DataArray accessors are active, the following members will be added to the accessor of the result:

    • pca_res.st.loadings: The PCA loadings as a DataArray

    • pca_res.st.pca_object: The scikit-learn pipeline used for PCA, including the MinMaxScaler

    • pca_res_st.use_to_transform(other_da: xr.DataArray): A function which transforms its argument (other data) using the pipeline that has been fitted to the current data.

    (NB. The above assumes that the accessor name used is st, the default)

  • [pca_object] – The trained PCA object produced by scikit-learn, if return_pca_object=True

  • Examples

  • ———

  • >>> pca_results1 = data1.st.pca(‘features’)

  • >>> pca_results1.st.loadings # See the loadings

  • >>> pca_results2 = pca_results1.st.use_to_transform(data2)

Return type:

tuple[xarray.DataArray, sklearn.decomposition.PCA] | xarray.DataArray

principal_component_analysis
PCA