shnitsel.vis.plot.kde#

Functions#

_fit_kdes(pca_data, geo_property, geo_kde_ranges)

Fit a set of KDEs to the pca_data, after it has been split into subsets based on the values of

_eval_kdes(kernels, xx, yy)

Evaluate all fitted gaussian kernel density estimators on a mesh-grid

_get_xx_yy(pca_data[, num_steps, extension])

Get appropriately over-sized mesh-grids for x and y coordinates

_fit_and_eval_kdes(pca_data, geo_property, geo_kde_ranges)

Fit KDEs for each range of the geo_kde_ranges and filter by the value of geo_property

_plot_kdes(xx, yy, Zs[, colors, contour_levels, ...])

Plot contours of kernel density estimates

biplot_kde(frames, *ids[, pca_data, state_selection, ...])

Generates a biplot that visualizes PCA projections and kernel density estimates (KDE)

plot_cdf_for_kde(z, contour_level[, ax])

Plot the cumulative density for a KDE, to show what

Module Contents#

_fit_kdes(pca_data, geo_property, geo_kde_ranges)#

Fit a set of KDEs to the pca_data, after it has been split into subsets based on the values of geo_property.

The parameter geo_kde_ranges specifies the subsets of the values of geo_property that should be filtered into the same subset. Returns one KDE for each such subset.

Parameters:
  • pca_data (xr.DataArray) – The pca data for which KDEs should be fitted on the various ranges.

  • geo_property (xr.DataArray) – The geometric property that the data should be clustered/filtered by.

  • geo_kde_ranges (Sequence[tuple[float, float]]) – The sequence of (distinct) ranges of values of the geometric property that the pca_data should be divided by.

Returns:

The sequence of fitted KDEs (kernels) for each range of geo_kde_ranges.

Return type:

Sequence[stats.gaussian_kde]

Raises:

ValueError – If any of the geo_filter ranges is such that no points from geo_prop fall within it

_eval_kdes(kernels, xx, yy)#

Evaluate all fitted gaussian kernel density estimators on a mesh-grid and return the results.

Parameters:
  • kernels (Sequence[stats.gaussian_kde]) – The transformed pca data to get the supporting mesh grid for.

  • xx (np.ndarray) – The x coordinates of the mesh grid.

  • yy (np.ndarray) – The y coordinates of the mesh grid.

Returns:

The sequence of evaluated approximate probability densities at the positions described by xx and yy for each and every individual KDE provided in kernels.

Return type:

Sequence[np.ndarray]

_get_xx_yy(pca_data, num_steps=500, extension=0.1)#

Get appropriately over-sized mesh-grids for x and y coordinates with an excess overhang of extension relative to the min/max-to-mean distance and num_steps intermediate steps between the upper and lower bound.

Statistical properties will be derived from pca_data.

Parameters:
  • pca_data (xr.DataArray) – The transformed pca data to get the supporting mesh grid for.

  • num_steps (int) – Number of intermediate steps to generate in the grid. Defaults to 500.

  • optional (float) – Number of intermediate steps to generate in the grid. Defaults to 500.

  • extension (float) – Excess overhang beyond minima and maxima in x and y direction relative to their distance from the mean. Defaults to 0.1.

  • optional – Excess overhang beyond minima and maxima in x and y direction relative to their distance from the mean. Defaults to 0.1.

Returns:

First the numpy array holding x positions of a meshgrid Then the array holding y positions of a meshgrid.

Return type:

tuple[np.ndarray, np.ndarray]

_fit_and_eval_kdes(pca_data, geo_property, geo_kde_ranges, num_steps=500, extension=0.1)#

Fit KDEs for each range of the geo_kde_ranges and filter by the value of geo_property being within the respective range. Then return a mesh grid and the evaluation of these kernel estimators on that mash grid.

Parameters:
  • pca_data (xr.DataArray) – The transformed pca data to get the supporting mesh grid for and extract the KDEs from.

  • geo_property (xr.DataArray) – The geometric property that the data should be clustered/filtered by.

  • geo_kde_ranges (Sequence[tuple[float, float]]) – The sequence of (distinct) ranges of values of the geometric property that the pca_data should be divided by.

  • num_steps (int) – Number of intermediate steps to generate in the grid. Defaults to 500.

  • optional (float) – Number of intermediate steps to generate in the grid. Defaults to 500.

  • extension (float) – Excess overhang beyond minima and maxima in x and y direction relative to their distance from the mean. Defaults to 0.1.

  • optional – Excess overhang beyond minima and maxima in x and y direction relative to their distance from the mean. Defaults to 0.1.

Returns:

  • tuple[np.ndarray, np.ndarray, Sequence[np.ndarray]]

  • First the numpy array holding x positions of a meshgrid.

  • Then the array holding y positions of a meshgrid.

  • Last the Sequence of KDE evaluations on the meshgrid for each filter range.

Return type:

tuple[numpy.ndarray, numpy.ndarray, Sequence[numpy.ndarray]]

_plot_kdes(xx, yy, Zs, colors=None, contour_levels=None, contour_fill=True, fig=None, ax=None)#

Plot contours of kernel density estimates

Parameters:
  • xx (np.ndarray) – An array of x values

  • yy (np.ndarray) – An array of y values (must have the same shape as xx)

  • Zs (Sequence[np.ndarray]) – A list of arrays of z values (each array must have the same shape as xx and yy)

  • colors (Iterable, optional) – A set of colours accepted by matplotlib (e.g. a colormap) of at least the same length as Zs

  • contour_levels (int | list[float], optional) – Determines the number and positions of the contour lines / regions. (Passed to matplotlib.pyplot.contour)

  • contour_fill (bool, optional) – Whether to fill in the outlined contours (i.e. whether to use matplotlib.pyplot.contour or matplotlib.pyplot.contourf).

  • fig (Figure | SubFigure, optional) – A matplotlib Figure object into which to draw (if not provided, a new one will be created)

  • ax (Axes, optional) – A matplotlib Axes object into which to draw (if not provided, a new one will be created)

biplot_kde(frames, *ids, pca_data=None, state_selection=None, structure_selection=None, mol=None, geo_kde_ranges=None, scatter_color_property='time', geo_feature=None, geo_cmap='PRGn', time_cmap='cividis', contour_levels=None, contour_colors=None, contour_fill=True, num_bins=4, fig=None, center_mean=False)#

Generates a biplot that visualizes PCA projections and kernel density estimates (KDE) of a property (distance, angle, dihedral angle) describing the geometry of specified atoms. The property is chosen based on the number of atoms specified:

  • 2 atoms => distance

  • 3 atoms => angle

  • 4 atoms => dihedral angle

Parameters:
  • frames (xarray.Dataset | shnitsel.data.dataset_containers.shared.ShnitselDataset | shnitsel.data.tree.node.TreeNode[Any, shnitsel.data.dataset_containers.shared.ShnitselDataset | xarray.Dataset]) – A dataset containing trajectory frames with atomic coordinates. This needs to correspond to the data that was the input to pca_data if that parameter is provided.

  • *ids (int) – Indices for atoms to be used in geo_feature if geo_feature is not set. Note that pyramidalization angles cannot reliably be provided in this format.

  • pca_data (PCAResult, optional) – A PCA result to use for the analysis. If not provided, will perform PCA analysis based on structure_selection or a generic pairwise distance PCA on frames. Accordingly, if provided, the parameter frames needs to correspond to the input provided to obtain the value in `

  • structure_selection (StructureSelection | StructureSelectionDescriptor, optional) – An optional selection of features/structure to use for the PCA analysis.

  • geo_kde_ranges (Sequence[tuple[float, float]], optional) – A Sequence of tuples representing ranges. A KDE is plotted for each range, indicating the distribution of points for which the value of the geometry feature falls in that range. Default values are chosen depending on the type of feature that should be analyzed.

  • contour_levels (int | list[float], optional) – Contour levels for the KDE plot. Either the number of contour levels as an int or the list of floating point values at which the contour lines should be drawn. Defaults to [0.08, 1]. This parameter is passed to matplotlib.axes.Axes.contour.

  • scatter_color_property ({'time', 'geo'}, default='time') – Must be one of ‘time’ or ‘geo’. If ‘time’, the scatter-points will be colored based on the time coordinate; if ‘geo’, the scatter-points will be colored based on the relevant geometry feature (see above).

  • geo_cmap (str, default = 'PRGn') – The Colormap to use for the noodleplot, if scatter_color='geo'; this also determines contour colors unless contour_colors is set.

  • time_cmap (str, default = 'cividis') – The Colormap to use for the noodleplot, if scatter_color='time'.

  • contour_fill (bool, default = True) – Whether to plot filled contours (contour_fill=True, uses ax.contourf) or just contour lines (contour_fill=False, uses ax.contour).

  • contour_colors (list[str], optional) – An iterable (not a Colormap) of colours (in a format matplotlib will accept) to use for the contours. By default, the geo_cmap will be used; this defaults to ‘PRGn’.

  • num_bins ({1, 2, 3, 4}, default = 4) – number of bins to be visualized, must be an integer between 1 and 4

  • fig (mpl.figure.Figure, optional) – matplotlib.figure.Figure object into which the plot will be drawn; if not provided, one will be created using plt.figure(layout='constrained')

  • center_mean (bool, default = False) – Flag whether PCA data should be mean-centered before analysis. Defaults to False.

  • state_selection (shnitsel.filtering.state_selection.StateSelection | shnitsel.filtering.state_selection.StateSelectionDescriptor | None)

  • mol (rdkit.Chem.Mol | None)

  • geo_feature (shnitsel.filtering.structure_selection.BondDescriptor | shnitsel.filtering.structure_selection.AngleDescriptor | shnitsel.filtering.structure_selection.DihedralDescriptor | shnitsel.filtering.structure_selection.PyramidsDescriptor | None)

Returns:

  • Figure – The single figure of the PCA result, if the PCA result was not provided as a tree or on-the go PCA did not yield a tree result.

  • Sequence[Figure] – The sequence of all figures, one for each individual PCA result if the provided or obtained PCA result was a tree structure.

Return type:

matplotlib.figure.Figure | Sequence[matplotlib.figure.Figure]

Notes

  • Computes a geometric property of the specified atoms across all frames.

  • Uses kernel density estimation (KDE) to analyze the distance distributions.

  • Performs PCA on trajectory pairwise distances and visualizes clustering of structural changes.

  • Produces a figure with PCA projection, cluster analysis, and KDE plots.

plot_cdf_for_kde(z, contour_level, ax=None)#

Plot the cumulative density for a KDE, to show what proportion of points are contained by contours at a given density level

Parameters:
  • z (np.ndarray) – The values from the kernel evaluated over the input space

  • contour_level (float) – The cumulative density corresponding to this level will be marked on the graph

  • ax (Axes, optional) – A matplotlib.axes.Axes object into which to plot. (If not provided, one will be created.)

Returns:

The proportion of points contained by contours placed at density level

Return type:

y