shnitsel.vis.plot.pca_biplot#
Attributes#
Functions#
|
Create a noodle plot, i.e. a line or scatter plot of PCA-decomposed data. |
|
Get the loadings for the PCA of pairwise distances |
|
Plot all loadings as arrows. |
|
Cluster indices iteratively according to a provided function. |
|
Cluster loadings iteratively based on proximity on the |
|
Plot clusters of PCA loadings |
|
|
|
Group points based on their polar angles, and work out scale factors |
|
|
|
Plot selected clusters of the loadings of a pairwise distance PCA, |
|
|
|
Plot selected clusters of the loadings of a pairwise distance PCA, |
|
Bin angular data by clustering unit-circle projections |
|
Illustrate how angles have been binned. |
Calculate pairwise-distance PCA, cluster the loadings |
|
|
Module Contents#
- plot_noodleplot(noodle, hops_mask=None, fig=None, ax=None, c=None, colorbar_label=None, cmap=None, cnorm=None, cscale=None, noodle_kws=None, hops_kws=None, rasterized=True)#
Create a noodle plot, i.e. a line or scatter plot of PCA-decomposed data.
- Parameters:
noodle (xr.DataArray | TreeNode[Any, xr.DataArray]) – PCA decomposed data.
hops_mask (xr.DataArray | TreeNode[Any, xr.DataArray], optional) – DataArray holding hopping-point information of the trajectories. Defaults to None.
fig (Figure | SubFigure | None, optional) – Figure to plot the graph into. Defaults to None.
ax (Axes, optional) – The axes to plot into. Will be generated from fig if not provided. Defaults to None.
c (xr.DataArray | TreeNode[Any, xr.DataArray], optional) – The data to use for assigning the color to each individual data point. Defaults to None.
colorbar_label (str | None, optional) – Label to plot next to the colorbar. If not provided will wither be taken from the long_name attribute or name attribute of the data or defaults to t/fs.
cmap (str | Colormap | None, optional) – Colormap for plotting the datapoints. Defaults to None.
cnorm (Normalize | None, optional) – Normalization method to map data to the colormap. Defaults to None.
cscale (_type_, optional)) – The colorbar scale mapping that is used for creating the colorbar gradient. Defaults to None.
noodle_kws (dict, optional) – Keywords arguments for the noodle/PCA plot. Defaults to None.
hops_kws (dict, optional) – Keyword arguments for plotting the hopping points. Defaults to None.
rasterized (bool, optional) – Flag to control whether the plot will be rasterized. Defaults to True.
- Returns:
The :py:class:matplotlib.axes.Axes after plotting to them
- Return type:
Axes
- get_loadings(frames_or_pca_result, center_mean=False)#
Get the loadings for the PCA of pairwise distances for the positional data in
frames.- Parameters:
frames (xr.Dataset | Frames | Trajectory) – A Dataset with an ‘atXYZ’ data_var, which should have ‘atom’ and ‘direction’ dimensions.
center_mean (bool, optional) – Whether centering of the mean should be should be applied, by default False
frames_or_pca_result (xarray.Dataset | shnitsel.data.dataset_containers.Frames | shnitsel.data.dataset_containers.Trajectory | shnitsel.analyze.pca.PCAResult)
- Returns:
A DataArray of loadings with dimensions ‘PC’ (principal component) and ‘descriptor’ (atom combination, one for each pair of atoms).
- Return type:
xr.DataArray
- plot_loadings(ax, loadings)#
Plot all loadings as arrows.
- Parameters:
ax (Axes) – The
matplotlib.pyplot.axes.Axesobject onto which to plot the loadings.loadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by
shnitsel.vis.plot.pca_biplot.get_loadings().
- cluster_general(decider, n)#
Cluster indices iteratively according to a provided function.
- cluster_loadings(loadings, cutoff=0.05)#
Cluster loadings iteratively based on proximity on the principal component manifold
- Parameters:
loadings (xr.DataArray) – A DataArray of loadings
cutoff (float, optional) – An upper bound on the possible distances between a point in a cluster and other points, within which they will still be assigned to the smae cluster, by default 0.05
- Returns:
A list of clusters, where each cluster is represented as a list of indices corresponding to
loadings.- Return type:
- plot_clusters(loadings, clusters, ax=None, labels=None)#
Plot clusters of PCA loadings
- Parameters:
loadings (xarray.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by
shnitsel.vis.plot.pca_biplot.get_loadings().clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to
loadings; as produced byshnitsel.vis.plot.pca_biplot.get_clusters().ax (matplotlib.axes.Axes | None) – The
matplotlib.pyplot.axes.Axesobject onto which to plot (If not provided, one will be created.)labels (list[str] | None) – Labels for the loadings; if not provided, loadings will be labelled according to indices of the atoms to which they relate.
- _get_clusters_coords(loadings, descriptor_clusters)#
- _separate_angles(points, min_angle=10)#
Group points based on their polar angles, and work out scale factors by which to place labels along the ray from origin to point when annotating points, intending to avoid overlaps between labels.
- Parameters:
points (NDArray) – An array of shape (npoints, 2)
min_angle (float, optional) – The minimal difference in argument (angle from positive x-axis, in degrees), of two points, below which they will be considered part of the same cluster; by default 10
- Returns:
A dictionary mapping from indices (corresponding to
points) to scalefactors used to extrude the label away from the loading.- Return type:
- _filter_cluster_coords(coords, n)#
- plot_clusters_insets(ax, loadings, clusters, mol, min_angle=10, inset_scale=1, show_at_most=None)#
Plot selected clusters of the loadings of a pairwise distance PCA, and interpretations of those loadings, as highlighted molecular structures inset upon the loadings plot.
- Parameters:
ax (Axes) – The
matplotlib.pyplot.axes.Axesobject onto which to plot the loadingsloadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by
shnitsel.vis.plot.pca_biplot.get_loadings().clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to
loadings; as produced byshnitsel.vis.plot.pca_biplot.get_clusters().mol (Mol) – An RDKit
Molobject to be used for structure display.min_angle (float, optional) – Where multiple clusters of loadings lie in similar directions from the origin, they will be grouped together and only their member with the greatest radius will be annotated with a highlighted structure. This is the angle in degrees for the grouping behavior, by default 10.
inset_scale (float, optional) – A factor by which to scale the size of the inset highlighted structures.
show_at_most (int, optional) – Maximal number of clusters to show; if the number of clusters is greater than this value, the clusters with smallest radius will be excluded so that only this many remain.
- plot_clusters2#
- _get_axs(clusters, labels)#
- plot_clusters_grid(loadings, clusters, ax=None, labels=None, axs=None, mol=None)#
Plot selected clusters of the loadings of a pairwise distance PCA, and interpretations of those loadings:
On the left, a large plot of selected clusters of loadings indicated as arrows
On the right, a grid of structures corresponding to
structures of loadings; the pairs involved in the cluster are represented by colour-coding the atoms of the structures.
- Parameters:
loadings (xr.DataArray) – A DataArray of PCA loadings including an ‘descriptor’ dimension; as produced by
shnitsel.vis.plot.pca_biplot.get_loadings().clusters (list[list[int]]) – A list of clusters, where each cluster is represented as a list of indices corresponding to
loadings; as produced byshnitsel.vis.plot.pca_biplot.get_clusters().ax (Axes, optional) – The
matplotlib.pyplot.axes.Axesobject onto which to plot the loadings (If not provided, one will be created.)labels (list[str], optional) – Labels for the loadings; if not provided, loadings will be labelled according to indices of the atoms to which they relate.
axs (dict[str, Axes], optional) – A dictionary mapping from plot labels to
matplotlib.pyplot.axes.Axesobjects (If not provided, one will be created.)mol (Mol, optional) – An RDKit
Molobject to be used for structure display
- plot_clusters3#
- circbins(angles, num_bins=4)#
Bin angular data by clustering unit-circle projections
- Parameters:
angles (np.ndarray) – Angles in degrees
num_bins (int, optional) – Number of bins to return, by default 4
- Returns:
bins (Sequence[np.ndarray]) – Indices of angles belonging to each bin as an np.ndarray
edges (list[tuple[float, float]]) – Tuple giving a pair of boundary angles for each bin; the order of the bins corresponds to the order used in
bins
- Return type:
- plot_bin_edges(angles, radii, bins, edges, picks, ax, labels)#
Illustrate how angles have been binned.
- Parameters:
angles (NDArray) – A 1D array of angles in degrees.
radii (NDArray) – A 1D array of radii, with order corresponding to
angles.bins (list[Iterable[int]]) – Lists of bins, each bin represented as a list of indices.
edges (list[tuple[float, float]]) – A pair of edges (angles in degrees) for each bin in
bins.picks (list[int]) – A list of indices indicating which cluster has been chosen from each bin.
ax (Axes) – An matplotlib
Axesobject onto which to plot; this should be set up with polar projection.
- pick_clusters(frames: shnitsel.data.dataset_containers.Frames | xarray.Dataset | shnitsel.analyze.pca.PCAResult, num_bins: int, center_mean: bool = False) dict#
- pick_clusters(frames: shnitsel.data.tree.node.TreeNode[Any, shnitsel.analyze.pca.PCAResult], num_bins: int, center_mean: bool = False) shnitsel.data.tree.node.TreeNode[Any, dict]
Calculate pairwise-distance PCA, cluster the loadings and pick a representative subset of the clusters.
- Parameters:
frames (Frames | xr.Dataset | PCAResult) – An
xarray.Datasetwith an ‘atXYZ’ variable having an ‘atom’ dimension to calculate a pwdist PCA on or the result of a previously executed PCA.num_bins (int) – The number of bins to use when binning clusters of loadings according to the angle they make to the x-axis on the projection manifold
center_mean (bool, optional) – Flag to apply mean centering before the analysis, by default Faule
- Returns:
dict –
A dictionary with the following key-value pairs:
loadings: the loadings of the PCA
clusters: a list of clusters, where each cluster is represented as a
list of indices corresponding to
loadings; as produced byshnitsel.vis.plot.pca_biplot.get_clusters().picks: the cluster chosen from each bin of clusters
angles: the angular argument (rotation from the positive x-axis) of each
cluster center - center: the circular mean of the angle of all picked clusters - radii: The distance of each cluster from the origin - bins: Indices of angles belonging to each bin - edges: Tuple giving a pair of boundary angles for each bin;
the order of the bins corresponds to the order used in
binsTreeNode[Any, dict] – If provided with a tree as input, this is returned per input leaf as a tree again
- _binning_with_min_entries(num_bins, angles, radii, min_entries=4, max_attempts=10, return_bins_edges=False)#