shnitsel.clean#

Submodules#

Functions#

filter_by_energy(frames_or_trajectory[, ...])

Filter trajectories according to energy to exclude unphysical (insane) behaviour

filter_by_length(frames_or_trajectory[, ...])

Filter trajectories according to bond length

sanity_check(trajectory_or_frames[, filter_method, ...])

Filter trajectories according to energy to exclude unphysical (insane) behaviour

Package Contents#

filter_by_energy(frames_or_trajectory, filter_method='truncate', *, energy_thresholds=None, plot_thresholds=False, plot_populations=False)#

Filter trajectories according to energy to exclude unphysical (insane) behaviour

Parameters:
  • frames_or_trajectory (TrajectoryOrFrames) – A Frames or Trajectory object with astate, energy, and ideally e_kin variables. If astate is not set, no filtering will be performed and no filtranda assigned.

  • filter_method (Literal['truncate', 'omit', 'annotate'] | float) –

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

  • optional

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

  • energy_thresholds (dict[str, float] | EnergyFiltrationThresholds | None) – Threshold for total, potential and kinetic energy of the system. Can specify thresholds for overall drift and individual time step changes. Can also specify thresholds for energy steps at hops. Unit should be specified as a member variable. If not provided will default to some reasonable default values as seen in EnergyThresholds definition.

  • optional – Threshold for total, potential and kinetic energy of the system. Can specify thresholds for overall drift and individual time step changes. Can also specify thresholds for energy steps at hops. Unit should be specified as a member variable. If not provided will default to some reasonable default values as seen in EnergyThresholds definition.

  • plot_thresholds (bool | Sequence[float]) –

    See shnitsel.vis.plot.filtration.check_thresholds().

    • If True, will plot using check_thresholds with

    default quantiles - If a Sequence, will plot using check_thresholds with specified quantiles - If False (the default), will not plot threshold plot

  • plot_populations (Literal['independent', 'intersections', False]) –

    See shnitsel.vis.plot.filtration.validity_populations().

    • If 'intersections', will plot populations of

    trajectories satisfying intersecting conditions - If 'independent', will plot populations of trajectories satisfying conditions taken independently - If False (the default), will not plot populations plot

Return type:

The sanitized xr.Dataset

Notes

The resulting object has a filtranda data_var, representing the values by which the data were filtered. If the input has a filtranda data_var, it is overwritten.

filter_by_length(frames_or_trajectory, filter_method='truncate', *, geometry_thresholds=None, mol=None, plot_thresholds=False, plot_populations=False)#

Filter trajectories according to bond length

Parameters:
  • frames_or_trajectory (Trajectory | Frames | xr.Dataset) – A Trajectory or Frames Dataset with an atXYZ variable (NB. this function takes an xr.Dataset as opposed to an xr.DataArray for consistency with shnitsel.clean.filter_by_energy())

  • filter_method (Literal["truncate", "omit", "annotate"] | float, optional) –

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

  • geometry_thresholds (GeometryFiltrationThresholds, optional) –

    A mapping from SMARTS-strings to length-thresholds.

    • The SMARTS-strings describe bonds which are searched

      for in an RDKit Mol object obtained via shnitsel.bridges.default_mol()

    • The thresholds describe maximal tolerable bond-lengths; if there are multiple matches

      for a given search, the longest bond-length will be considered for each frame

    • The unit for the maximum length is provided in the member variable length_unit which defaults to angstrom.

    • If not provided will be initialized with thresholds for H-(C/N) bonds and one for all bonds.

  • mol (Mol, optional) –

    An rdkit mol object, if not provided it will be generated from the XYZ coordinates in the data See shnitsel.vis.plot.filtration.check_thresholds().

    • If True, will plot using check_thresholds with

    default quantiles - If a Sequence, will plot using check_thresholds with specified quantiles - If False, will not plot threshold plot

  • plot_populations (Literal["independent", "intersections", False], optional) –

    See shnitsel.vis.plot.filtration.validity_populations().

    • If 'intersections', will plot populations of

    trajectories satisfying intersecting conditions - If 'independent', will plot populations of trajectories satisfying conditions taken independently - If False, will not plot populations plot

  • plot_thresholds (bool | Sequence[float])

Return type:

The filtered Dataset or None if the filter method results in the trajectory being rejected.

Notes

The resulting object has a filtranda data_var, representing the values by which the data were filtered. If the input has a filtranda data_var, it is overwritten. An existing ‘criterion’ dimension will be dropped from the frames_or_trajectory parameter along with all variables and coordinates tied to it.

sanity_check(trajectory_or_frames, filter_method='truncate', *, energy_thresholds=None, geometry_thresholds=None, plot_thresholds=False, plot_populations=False, mol=None, drop_empty_trajectories=False)#

Filter trajectories according to energy to exclude unphysical (insane) behaviour

Parameters:
  • trajectory_or_frames (Trajectory | Frames | TreeNode[Any, Trajectory|Frames]) – A Trajectory or Frames object (or a ShnitselDB structure holding such objects) with an atXYZ variable as well as astate, energy, and ideally e_kin variables

  • filter_method (Literal["truncate", "omit", "annotate"] | float, optional) –

    Specifies the manner in which to remove data;
    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria

      (shnitsel.clean.truncate())

    • if ‘annotate’, merely annotate the data;

    • if a float number, interpret this number as a time, and cut all trajectories off at this time,

      discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    see shnitsel.clean.dispatch_filter().

  • energy_thresholds (EnergyFiltrationThresholds, optional) – Threshold for total, potential and kinetic energy of the system. Can specify thresholds for overall drift and individual time step changes. Can also specify thresholds for energy steps at hops. Unit should be specified as a member variable. If not provided will default to some reasonable default values as seen in EnergyThresholds definition.

  • geometry_thresholds (GeometryFiltrationThresholds, optional) –

    A mapping from SMARTS-strings to length-thresholds.

    • The SMARTS-strings describe bonds which are searched

      for in an RDKit Mol object obtained via shnitsel.bridges.default_mol()

    • The thresholds describe maximal tolerable bond-lengths; if there are multiple matches

      for a given search, the longest bond-length will be considered for each frame

    • The unit for the maximum length is provided in the member variable length_unit which defaults to angstrom.

    • If not provided will be initialized with thresholds for H-(C/N) bonds and one for all bonds.

  • plot_thresholds (bool, optional) –

    See shnitsel.vis.plot.filtration.check_thresholds().

    • If True, will plot using check_thresholds with

    default quantiles - If a Sequence, will plot using check_thresholds with specified quantiles - If False, will not plot threshold plot

  • plot_populations (Literal ['intersections', 'independent', False], optional) –

    See shnitsel.vis.plot.filtration.validity_populations().

    • If 'intersections', will plot populations of

    trajectories satisfying intersecting conditions - If 'independent', will plot populations of trajectories satisfying conditions taken independently - If False, will not plot populations plot

  • mol (rdkit.Chem.Mol, optional) – Optional parameter to provide a mol object to base structure analysis on, by default generated from the first frame in the trajectory or frameset.

  • drop_empty_trajectories (bool, optional) – Flag to not include trajectories for which the sanity check result was empty in the final result tree, by default False. Only used for tree-structure inputs.

Returns:

  • The sanitized trajectory, frames or tree.

  • A tree is sanitized by applying the sanitization function to all individual data points in the tree.

Return type:

shnitsel.data.tree.node.TreeNode[Any, TrajectoryOrFrames] | TrajectoryOrFrames | None

Notes

The resulting object has a energy_filtranda and a geometry_filtranda data_var, representing the values by which the data were filtered. If the input has a filtranda data_var, it is overwritten. If the input has a criterion dimension, it will be dropped.