shnitsel.vis.plot.pca_biplot#

Attributes#

Functions#

plot_noodleplot(noodle[, hops_mask, fig, ax, c, ...])

Create a noodle plot, i.e. a line or scatter plot of PCA-decomposed data.

get_loadings(frames_or_pca_result[, center_mean])

Get the loadings for the PCA of pairwise distances

plot_loadings(ax, loadings)

Plot all loadings as arrows.

cluster_general(decider, n)

Cluster indices iteratively according to a provided function.

cluster_loadings(loadings[, cutoff])

Cluster loadings iteratively based on proximity on the

plot_clusters(loadings, clusters[, ax, labels])

Plot clusters of PCA loadings

_get_clusters_coords(loadings, descriptor_clusters)

_separate_angles(points[, min_angle])

Group points based on their polar angles, and work out scale factors

_filter_cluster_coords(coords, n)

plot_clusters_insets(ax, loadings, clusters, mol[, ...])

Plot selected clusters of the loadings of a pairwise distance PCA,

_get_axs(clusters, labels)

plot_clusters_grid(loadings, clusters[, ax, labels, ...])

Plot selected clusters of the loadings of a pairwise distance PCA,

circbins(angles[, num_bins])

Bin angular data by clustering unit-circle projections

plot_bin_edges(angles, radii, bins, edges, picks, ax, ...)

Illustrate how angles have been binned.

pick_clusters(…)

Calculate pairwise-distance PCA, cluster the loadings

_binning_with_min_entries(num_bins, angles, radii[, ...])

Module Contents#

plot_noodleplot(noodle, hops_mask=None, fig=None, ax=None, c=None, colorbar_label=None, cmap=None, cnorm=None, cscale=None, noodle_kws=None, hops_kws=None, rasterized=True)#

Create a noodle plot, i.e. a line or scatter plot of PCA-decomposed data.

Parameters:
  • noodle (xr.DataArray | TreeNode[Any, xr.DataArray]) – PCA decomposed data.

  • hops_mask (xr.DataArray | TreeNode[Any, xr.DataArray], optional) – DataArray holding hopping-point information of the trajectories. Defaults to None.

  • fig (Figure | SubFigure | None, optional) – Figure to plot the graph into. Defaults to None.

  • ax (Axes, optional) – The axes to plot into. Will be generated from fig if not provided. Defaults to None.

  • c (xr.DataArray | TreeNode[Any, xr.DataArray], optional) – The data to use for assigning the color to each individual data point. Defaults to None.

  • colorbar_label (str | None, optional) – Label to plot next to the colorbar. If not provided will wither be taken from the long_name attribute or name attribute of the data or defaults to t/fs.

  • cmap (str | Colormap | None, optional) – Colormap for plotting the datapoints. Defaults to None.

  • cnorm (Normalize | None, optional) – Normalization method to map data to the colormap. Defaults to None.

  • cscale (_type_, optional)) – The colorbar scale mapping that is used for creating the colorbar gradient. Defaults to None.

  • noodle_kws (dict, optional) – Keywords arguments for the noodle/PCA plot. Defaults to None.

  • hops_kws (dict, optional) – Keyword arguments for plotting the hopping points. Defaults to None.

  • rasterized (bool, optional) – Flag to control whether the plot will be rasterized. Defaults to True.

Returns:

The :py:class:matplotlib.axes.Axes after plotting to them

Return type:

Axes

get_loadings(frames_or_pca_result, center_mean=False)#

Get the loadings for the PCA of pairwise distances for the positional data in frames.

Parameters:
Returns:

A DataArray of loadings with dimensions ‘PC’ (principal component) and ‘descriptor’ (atom combination, one for each pair of atoms).

Return type:

xr.DataArray

plot_loadings(ax, loadings)#

Plot all loadings as arrows.

Parameters:
  • ax (Axes) – The matplotlib.pyplot.axes.Axes object onto which to plot the loadings.

  • loadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by shnitsel.vis.plot.pca_biplot.get_loadings().

cluster_general(decider, n)#

Cluster indices iteratively according to a provided function.

Parameters:
  • decider (Callable[[int, int], bool]) – A function to decide whether two points can potentially share a cluster.

  • n (int) – The number of indices to cluster.

Returns:

A list of clusters, where each cluster is represented as a list of indices.

Return type:

list[list[int]]

cluster_loadings(loadings, cutoff=0.05)#

Cluster loadings iteratively based on proximity on the principal component manifold

Parameters:
  • loadings (xr.DataArray) – A DataArray of loadings

  • cutoff (float, optional) – An upper bound on the possible distances between a point in a cluster and other points, within which they will still be assigned to the smae cluster, by default 0.05

Returns:

A list of clusters, where each cluster is represented as a list of indices corresponding to loadings.

Return type:

list[list[int]]

plot_clusters(loadings, clusters, ax=None, labels=None)#

Plot clusters of PCA loadings

Parameters:
  • loadings (xarray.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by shnitsel.vis.plot.pca_biplot.get_loadings().

  • clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to loadings; as produced by shnitsel.vis.plot.pca_biplot.get_clusters().

  • ax (matplotlib.axes.Axes | None) – The matplotlib.pyplot.axes.Axes object onto which to plot (If not provided, one will be created.)

  • labels (list[str] | None) – Labels for the loadings; if not provided, loadings will be labelled according to indices of the atoms to which they relate.

_get_clusters_coords(loadings, descriptor_clusters)#
_separate_angles(points, min_angle=10)#

Group points based on their polar angles, and work out scale factors by which to place labels along the ray from origin to point when annotating points, intending to avoid overlaps between labels.

Parameters:
  • points (NDArray) – An array of shape (npoints, 2)

  • min_angle (float, optional) – The minimal difference in argument (angle from positive x-axis, in degrees), of two points, below which they will be considered part of the same cluster; by default 10

Returns:

A dictionary mapping from indices (corresponding to points) to scalefactors used to extrude the label away from the loading.

Return type:

dict[int, float]

_filter_cluster_coords(coords, n)#
plot_clusters_insets(ax, loadings, clusters, mol, min_angle=10, inset_scale=1, show_at_most=None)#

Plot selected clusters of the loadings of a pairwise distance PCA, and interpretations of those loadings, as highlighted molecular structures inset upon the loadings plot.

Parameters:
  • ax (Axes) – The matplotlib.pyplot.axes.Axes object onto which to plot the loadings

  • loadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by shnitsel.vis.plot.pca_biplot.get_loadings().

  • clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to loadings; as produced by shnitsel.vis.plot.pca_biplot.get_clusters().

  • mol (Mol) – An RDKit Mol object to be used for structure display.

  • min_angle (float, optional) – Where multiple clusters of loadings lie in similar directions from the origin, they will be grouped together and only their member with the greatest radius will be annotated with a highlighted structure. This is the angle in degrees for the grouping behavior, by default 10.

  • inset_scale (float, optional) – A factor by which to scale the size of the inset highlighted structures.

  • show_at_most (int, optional) – Maximal number of clusters to show; if the number of clusters is greater than this value, the clusters with smallest radius will be excluded so that only this many remain.

plot_clusters2#
_get_axs(clusters, labels)#
plot_clusters_grid(loadings, clusters, ax=None, labels=None, axs=None, mol=None)#

Plot selected clusters of the loadings of a pairwise distance PCA, and interpretations of those loadings:

  • On the left, a large plot of selected clusters of loadings indicated as arrows

  • On the right, a grid of structures corresponding to

structures of loadings; the pairs involved in the cluster are represented by colour-coding the atoms of the structures.

Parameters:
  • loadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by shnitsel.vis.plot.pca_biplot.get_loadings().

  • clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to loadings; as produced by shnitsel.vis.plot.pca_biplot.get_clusters().

  • ax (Axes, optional) – The matplotlib.pyplot.axes.Axes object onto which to plot the loadings (If not provided, one will be created.)

  • labels (list[str], optional) – Labels for the loadings; if not provided, loadings will be labelled according to indices of the atoms to which they relate.

  • axs (dict[str, Axes], optional) – A dictionary mapping from plot labels to matplotlib.pyplot.axes.Axes objects (If not provided, one will be created.)

  • mol (Mol, optional) – An RDKit Mol object to be used for structure display

plot_clusters3#
circbins(angles, num_bins=4)#

Bin angular data by clustering unit-circle projections

Parameters:
  • angles (np.ndarray) – Angles in degrees

  • num_bins (int, optional) – Number of bins to return, by default 4

Returns:

  • bins (Sequence[np.ndarray]) – Indices of angles belonging to each bin as an np.ndarray

  • edges (list[tuple[float, float]]) – Tuple giving a pair of boundary angles for each bin; the order of the bins corresponds to the order used in bins

Return type:

tuple[Sequence[numpy.ndarray], list[tuple[float, float]]]

plot_bin_edges(angles, radii, bins, edges, picks, ax, labels)#

Illustrate how angles have been binned.

Parameters:
  • angles (NDArray) – A 1D array of angles in degrees.

  • radii (NDArray) – A 1D array of radii, with order corresponding to angles.

  • bins (list[Iterable[int]]) – Lists of bins, each bin represented as a list of indices.

  • edges (list[tuple[float, float]]) – A pair of edges (angles in degrees) for each bin in bins.

  • picks (list[int]) – A list of indices indicating which cluster has been chosen from each bin.

  • ax (Axes) – An matplotlib Axes object onto which to plot; this should be set up with polar projection.

  • labels (list[str]) – One label for each entry in picks.

pick_clusters(frames: shnitsel.data.dataset_containers.Frames | xarray.Dataset | shnitsel.analyze.pca.PCAResult, num_bins: int, center_mean: bool = False) dict#
pick_clusters(frames: shnitsel.data.tree.node.TreeNode[Any, shnitsel.analyze.pca.PCAResult], num_bins: int, center_mean: bool = False) shnitsel.data.tree.node.TreeNode[Any, dict]

Calculate pairwise-distance PCA, cluster the loadings and pick a representative subset of the clusters.

Parameters:
  • frames (Frames | xr.Dataset | PCAResult) – An xarray.Dataset with an ‘atXYZ’ variable having an ‘atom’ dimension to calculate a pwdist PCA on or the result of a previously executed PCA.

  • num_bins (int) – The number of bins to use when binning clusters of loadings according to the angle they make to the x-axis on the projection manifold

  • center_mean (bool, optional) – Flag to apply mean centering before the analysis, by default Faule

Returns:

  • dict

    A dictionary with the following key-value pairs:

    • loadings: the loadings of the PCA

    • clusters: a list of clusters, where each cluster is represented as a

    list of indices corresponding to loadings; as produced by shnitsel.vis.plot.pca_biplot.get_clusters().

    • picks: the cluster chosen from each bin of clusters

    • angles: the angular argument (rotation from the positive x-axis) of each

    cluster center - center: the circular mean of the angle of all picked clusters - radii: The distance of each cluster from the origin - bins: Indices of angles belonging to each bin - edges: Tuple giving a pair of boundary angles for each bin;

    the order of the bins corresponds to the order used in bins

  • TreeNode[Any, dict] – If provided with a tree as input, this is returned per input leaf as a tree again

_binning_with_min_entries(num_bins, angles, radii, min_entries=4, max_attempts=10, return_bins_edges=False)#
Parameters:
  • num_bins (int)

  • angles (numpy.typing.NDArray)

  • radii (numpy.typing.NDArray)

  • min_entries (int)

  • max_attempts (int)

  • return_bins_edges (bool)

Return type:

Sequence[int] | tuple[Sequence[int], Sequence[numpy.typing.NDArray], list[tuple[float, float]]]