shnitsel.geo.analogs#

Attributes#

Classes#

Functions#

_find_atom_pairs(mol, atoms)

Method to find all atom pairs that constitute a bond in the molecule

_substruct_match_to_submol(mol, substruct_match)

Build a sub-mol from a substructure match.

_substruct_match_to_mapping(mol, substruct_match)

Build a mapping of the mol ids to the ids of the substructure.

get_MCS_smarts(mols)

Helper function to get the maximum common substructure (MCS) SMARTS string

identify_analogs_mappings(mols[, smarts])

Helper function to generate a maximum common substructure match and

_list_analogs(ensembles[, smarts, vis])

Extract a common moiety from a selection of ensembles.

extract_analogs(…)

Combine ensembles for different compounds by finding the

Module Contents#

DatasetOrArray#
class StructureMapping(original_mol, res_mol, mapping)#
Parameters:
  • original_mol (rdkit.Chem.Mol)

  • res_mol (rdkit.Chem.Mol)

  • mapping (Mapping[int, int])

_submol: rdkit.Chem.Mol#
_orig_mol: rdkit.Chem.Mol#
_full_mapping: Mapping[int, int]#
__call__(ds_or_da: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]#
__call__(ds_or_da: DatasetOrArray) DatasetOrArray
apply(ds_or_da: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]#
apply(ds_or_da: DatasetOrArray) DatasetOrArray
_find_atom_pairs(mol, atoms)#

Method to find all atom pairs that constitute a bond in the molecule and return the associated bond ids as a list.

The list of bond ids can serve as a path within a molecule to support the extraction of a submolecule that contains also bonds at the edge of a SMARTS string structure.

Parameters:
  • mol (rc.Mol) – The molecule to find the bonds within

  • atoms (Sequence[int]) – The set of atoms among which we want to try to find the bonds to construct a path of bonds.

Returns:

The list of the bond ids of all bonds within mol between the atoms.

Return type:

list[int]

_substruct_match_to_submol(mol, substruct_match)#

Build a sub-mol from a substructure match.

This is used for analogs mapping.

Parameters:
  • mol (rc.Mol) – The mol on which the substructure has matched

  • substruct_match (tuple[int,...]) – The indices of the atoms that have matched.

Returns:

The mol object of the relevant substructure match.

Return type:

rc.Mol

_substruct_match_to_mapping(mol, substruct_match)#

Build a mapping of the mol ids to the ids of the substructure.

This allows for selection of the atoms within a dataset based on a substructure match.

Parameters:
  • mol (rc.Mol) – The original mol on which the substructure match was obtained

  • substruct_match (tuple[int,...]) – The index list of the substructure match.

Returns:

  • rc.Mol – The matched submol object.

  • StructureMapping – The mapping of original mol atom indices to submol indices or -1 if no longer present. Atoms no longer present in the final submol may either not be in the keys of the mapping or have a negative value associated with their index. Has an .apply() function to apply the mapping to datasets or data arrays

Return type:

tuple[rdkit.Chem.Mol, StructureMapping]

get_MCS_smarts(mols)#

Helper function to get the maximum common substructure (MCS) SMARTS string for a sequence of Molecular structures for further processing.

Parameters:

mols (Iterable[rc.Mol]) – The molecular structures to get the maximum common substructre between.

Returns:

The MCS SMARTS string.

Return type:

SMARTSstring

identify_analogs_mappings(mols, smarts='')#

Helper function to generate a maximum common substructure match and from that extract substructure mappings for each of the provided molecules.

If provided a smarts string, the MCS will be skipped and instead an attempt will be made to match smarts against all provided structures.

Parameters:
  • mols (Mapping[Hashable | int, rc.Mol]) – The molecular structures to use as a basis for the MCS analysis or for finding the smarts string in if provided.

  • smarts (shnitsel.filtering.structure_selection.SMARTSstring)

Returns:

First the resulting (or used) SMARTS string for the structure. Then the Mapping between original keys and the resulting StructureMapping object that can be applied to the original data.

Return type:

tuple[SMARTSstring, Mapping[Hashable | int, StructureMapping]]

_list_analogs(ensembles, smarts='', vis=False)#

Extract a common moiety from a selection of ensembles.

By default, this attempts to find the largest possible match using equivalence of any heavy atoms. H-atoms can only match other H-atoms.

Parameters:
  • ensembles (Mapping[Hashable | int, xr.DataArray]) – An Iterable of ``xr.DataArray``s, each containing the geometries of an ensemble of trajectories for a different compound; they

  • smarts (SMARTSstring, optional) – A SMARTS-string indicating the moiety to cut out of each compound; in each case, the match returned by rdkit.Chem.Mol.GetSubstrucMatch() (not necessarily the only possible match) will be used; if no SMARTS is provided, a minimal common submol will be extracted using rdFMCS.FindMCS

  • vis (bool, default=False) – Whether to display a visual indication of the matches.

Return type:

An Iterable of ``xr.DataArray``s

extract_analogs(ensembles: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray] | None#
extract_analogs(ensembles: Mapping[Hashable | int, DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) Mapping[Hashable | int, DatasetOrArray] | None
extract_analogs(ensembles: Sequence[DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) Sequence[DatasetOrArray] | None

Combine ensembles for different compounds by finding the moieties they have in common

Parameters:
  • ensembles (TreeNode[Any, DatasetOrArray] | Mapping[Hashable | int, DatasetOrArray] | Sequence[DatasetOrArray]) –

    Input of Datasets or DataArrays or Shnitsel Wrappers optionally in a tree structure, each containing the geometries of an ensemble of trajectories for a different compound or structure.

    • If the ensemble is provided as a tree, the result will be a tree of a mostly identical structure.

      A grouping operation may be performed beforehand to avoid different structures to be in the same group.

    • If the input is a mapping, the keys will be preserved and the mappings will be applied to each entry

    • If a sequence is provided, the order of inputs will be preserved and the mapping will be applied to each entry in order.

  • smarts (SMARTSstring) – A SMARTS-string indicating the moiety to cut out of each compound; in each case, the match returned by rdkit.Chem.Mol.GetSubstructMatch() (not necessarily the only possible match) will be used; if no SMARTS is provided, a minimal common submol will be extracted using rdFMCS.FindMCS

  • vis (bool, default=False) – Deprecated; Whether to display a visual indication of the match, by default False

  • **concat_kws – Deprecated; Keyword arguments for internal calls to xr.concat

Returns:

  • TreeNode[Any, DatasetOrArray] – An tree holding the analog substructures in its leaves.

  • Mapping[Hashable | int, DatasetOrArray]

  • | Sequence[DatasetOrArray] – Either a mapping or a sequence of xr.Dataset or xr.DataArray of trajectories, holding the mapped inputs from ensembles.

Raises:
  • ValueError – If the ensembles provided could not be brought into agreement.

  • AssertionError – If the tree is not of a suppported format.