shnitsel.clean

Submodules

Functions

energy_filtranda(frames, *[, etot_drift, etot_step, ...])

Derive energetic filtration targets from an xr.Dataset

sanity_check(frames[, cut, units, etot_drift, ...])

Filter trajectories according to energy to exclude unphysical (insane) behaviour

bond_length_filtranda(frames[, search_dict, units, mol])

Derive bond length filtration targets from an xr.Dataset

filter_by_length(frames[, cut, search_dict, units, ...])

Filter trajectories according to bond length

omit(ds)

truncate(ds)

transect(ds, cutoff)

cum_max_quantiles(obj[, quantiles])

true_upto(mask, dim)

cum_mask_from_dataset(ds)

cum_mask_from_filtranda(filtranda)

Package Contents

energy_filtranda(frames, *, etot_drift=None, etot_step=None, epot_step=None, ekin_step=None, hop_epot_step=None, units='eV')

Derive energetic filtration targets from an xr.Dataset

Parameters:
  • frames – A xr.Dataset with astate, energy, and ideally e_kin variables

  • etot_drift (float | None) – Threshold for drift of total energy over an entire trajectory, by default 0.2 eV

  • optional – Threshold for drift of total energy over an entire trajectory, by default 0.2 eV

  • etot_step (float | None) – Threshold for difference in total energy from one frame to the next, ignoring hops , by default 0.1 eV

  • optional – Threshold for difference in total energy from one frame to the next, ignoring hops , by default 0.1 eV

  • epot_step (float | None) – Threshold for difference in potential energy from one frame to the next, ignoring hops, by default 0.7 eV

  • optional – Threshold for difference in potential energy from one frame to the next, ignoring hops, by default 0.7 eV

  • ekin_step (float | None) – Threshold for difference in kinetic energy from one frame to the next, ignoring hops, by default 0.7 eV

  • optional – Threshold for difference in kinetic energy from one frame to the next, ignoring hops, by default 0.7 eV

  • hop_epot_step (float | None) – Threshold for difference in potential energy across hops, by default 1.0 eV

  • optional – Threshold for difference in potential energy across hops, by default 1.0 eV

  • units – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘eV’

  • optional – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘eV’

Returns:

  • An xr.DataArray of filtration targets stacked along the criterion dimension;

  • criteria comprise epot_step and hop_epot_step, as well as

  • etot_drift, etot_step and ekin_step if the input contains an e_kin variable

sanity_check(frames, cut='truncate', *, units='eV', etot_drift=np.nan, etot_step=np.nan, epot_step=np.nan, ekin_step=np.nan, hop_epot_step=np.nan, plot_thresholds=False, plot_populations=False)

Filter trajectories according to energy to exclude unphysical (insane) behaviour

Parameters:
  • frames – A xr.Dataset with astate, energy, and ideally e_kin variables

  • cut (Literal['truncate', 'omit', False] | numbers.Number) –

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria (shnitsel.clean.truncate())

    • if a number, interpret this number as a time, and cut all trajectories off at this time, discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    • if False, merely annotate the data;

    see shnitsel.clean.dispatch_cut().

  • optional

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria (shnitsel.clean.truncate())

    • if a number, interpret this number as a time, and cut all trajectories off at this time, discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    • if False, merely annotate the data;

    see shnitsel.clean.dispatch_cut().

  • units – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘eV’

  • optional – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘eV’

  • etot_drift (float) – Threshold for drift of total energy over an entire trajectory, by default 0.2 eV

  • optional – Threshold for drift of total energy over an entire trajectory, by default 0.2 eV

  • etot_step (float) – Threshold for difference in total energy from one frame to the next, ignoring hops , by default 0.1 eV

  • optional – Threshold for difference in total energy from one frame to the next, ignoring hops , by default 0.1 eV

  • epot_step (float) – Threshold for difference in potential energy from one frame to the next, ignoring hops, by default 0.7 eV

  • optional – Threshold for difference in potential energy from one frame to the next, ignoring hops, by default 0.7 eV

  • ekin_step (float) – Threshold for difference in kinetic energy from one frame to the next, ignoring hops, by default 0.7 eV

  • optional – Threshold for difference in kinetic energy from one frame to the next, ignoring hops, by default 0.7 eV

  • hop_epot_step (float) – Threshold for difference in potential energy across hops, by default 1.0 eV

  • optional – Threshold for difference in potential energy across hops, by default 1.0 eV

  • plot_thresholds (bool | Sequence[float]) –

    See shnitsel.vis.plot.filtration.check_thresholds().

    • If True, will plot using check_thresholds with

    default quantiles - If a Sequence, will plot using check_thresholds with specified quantiles - If False, will not plot threshold plot

  • plot_populations (bool | Literal['independent', 'intersections']) –

    See shnitsel.vis.plot.filtration.validity_populations().

    • If True or 'intersections', will plot populations of

    trajectories satisfying intersecting conditions - If 'independent', will plot populations of trajectories satisfying conditions taken independently - If False, will not plot populations plot

Return type:

The sanitized xr.Dataset

Notes

The resulting object has a filtranda data_var, representing the values by which the data were filtered. If the input has a filtranda data_var, it is overwritten.

bond_length_filtranda(frames, search_dict=None, units='angstrom', mol=None)

Derive bond length filtration targets from an xr.Dataset

Parameters:
  • frames – A xr.Dataset with an atXYZ variable

  • search_dict (dict[str, numbers.Number] | None) –

    A mapping from SMARTS-strings to length-thresholds.

    • The SMARTS-strings describe bonds which are searched for in an RDKit Mol object obtained via shnitsel.bridges.default_mol()

    • The thresholds describe maximal tolerable bond-lengths; if there are multiple matches for a given search, the longest bond-length will be considered for each frame

  • optional

    A mapping from SMARTS-strings to length-thresholds.

    • The SMARTS-strings describe bonds which are searched for in an RDKit Mol object obtained via shnitsel.bridges.default_mol()

    • The thresholds describe maximal tolerable bond-lengths; if there are multiple matches for a given search, the longest bond-length will be considered for each frame

  • units – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘angstrom’

  • optional – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘angstrom’

  • mol (rdkit.Chem.Mol | None)

Returns:

  • An xr.DataArray of filtration targets stacked along the criterion dimension;

  • one criterion per search_dict entry.

filter_by_length(frames, cut='truncate', search_dict=None, units='angstrom', plot_thresholds=False, plot_populations=False, mol=None)

Filter trajectories according to bond length

Parameters:
  • frames – A xr.Dataset with an atXYZ variable (NB. this function takes an xr.Dataset as opposed to an xr.DataArray for consistency with shnitsel.clean.sanity_check())

  • cut (Literal['truncate', 'omit', False] | numbers.Number) –

    Specifies the manner in which to remove data;

    • if ‘omit’, drop trajectories unless all frames meet criteria (shnitsel.clean.omit())

    • if ‘truncate’, cut each trajectory off just before the first frame that doesn’t meet criteria (shnitsel.clean.truncate())

    • if a number, interpret this number as a time, and cut all trajectories off at this time, discarding those which violate criteria before reaching the given limit, (shnitsel.clean.transect())

    • if False, merely annotate the data;

    see shnitsel.clean.dispatch_cut().

  • search_dict (dict[str, numbers.Number] | None) –

    A mapping from SMARTS-strings to length-thresholds.

    • The SMARTS-strings describe bonds which are searched for in an RDKit Mol object obtained via shnitsel.bridges.default_mol()

    • The thresholds describe maximal tolerable bond-lengths; if there are multiple matches for a given search, the longest bond-length will be considered for each frame

  • plot_thresholds (bool | Sequence[float]) –

    See shnitsel.vis.plot.filtration.check_thresholds().

    • If True, will plot using check_thresholds with

    default quantiles - If a Sequence, will plot using check_thresholds with specified quantiles - If False, will not plot threshold plot

  • plot_populations (bool | Literal['independent', 'intersections']) –

    See shnitsel.vis.plot.filtration.validity_populations().

    • If True or 'intersections', will plot populations of

    trajectories satisfying intersecting conditions - If 'independent', will plot populations of trajectories satisfying conditions taken independently - If False, will not plot populations plot

  • mol (rdkit.Chem.Mol | None) – An rdkit mol object, if not provided it will be generated from the XYZ coordinates in the data

  • units (str) – Units in which custom thresholds are given, and to which defaults and data will be converted, by default ‘angstrom’

Return type:

The filtered Dataset

Notes

The resulting object has a filtranda data_var, representing the values by which the data were filtered. If the input has a filtranda data_var, it is overwritten.

omit(ds)
Parameters:

ds (shnitsel.data.trajectory_format.Trajectory)

truncate(ds)
Parameters:

ds (shnitsel.data.trajectory_format.Trajectory)

transect(ds, cutoff)
Parameters:
  • ds (shnitsel.data.trajectory_format.Trajectory | shnitsel.core.typedefs.Frames)

  • cutoff (float)

cum_max_quantiles(obj, quantiles=None)
true_upto(mask, dim)
cum_mask_from_dataset(ds)
Parameters:

ds (shnitsel.core.typedefs.Stacked | shnitsel.core.typedefs.Unstacked)

Return type:

shnitsel.core.typedefs.Unstacked

cum_mask_from_filtranda(filtranda)
Parameters:

filtranda (shnitsel.core.typedefs.Stacked | shnitsel.core.typedefs.Unstacked)

Return type:

shnitsel.core.typedefs.Unstacked