shnitsel.geo.analogs#
Attributes#
Classes#
Functions#
|
Method to find all atom pairs that constitute a bond in the molecule |
|
Build a sub-mol from a substructure match. |
|
Build a mapping of the mol ids to the ids of the substructure. |
|
Helper function to get the maximum common substructure (MCS) SMARTS string |
|
Helper function to generate a maximum common substructure match and |
|
Extract a common moiety from a selection of ensembles. |
Combine ensembles for different compounds by finding the |
Module Contents#
- DatasetOrArray#
- class StructureMapping(original_mol, res_mol, mapping)#
-
- _submol: rdkit.Chem.Mol#
- _orig_mol: rdkit.Chem.Mol#
- __call__(ds_or_da: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]#
- __call__(ds_or_da: DatasetOrArray) DatasetOrArray
- apply(ds_or_da: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray]#
- apply(ds_or_da: DatasetOrArray) DatasetOrArray
- _find_atom_pairs(mol, atoms)#
Method to find all atom pairs that constitute a bond in the molecule and return the associated bond ids as a list.
The list of bond ids can serve as a path within a molecule to support the extraction of a submolecule that contains also bonds at the edge of a SMARTS string structure.
- _substruct_match_to_submol(mol, substruct_match)#
Build a sub-mol from a substructure match.
This is used for analogs mapping.
- _substruct_match_to_mapping(mol, substruct_match)#
Build a mapping of the mol ids to the ids of the substructure.
This allows for selection of the atoms within a dataset based on a substructure match.
- Parameters:
- Returns:
rc.Mol – The matched submol object.
StructureMapping – The mapping of original mol atom indices to submol indices or -1 if no longer present. Atoms no longer present in the final submol may either not be in the keys of the mapping or have a negative value associated with their index. Has an .apply() function to apply the mapping to datasets or data arrays
- Return type:
tuple[rdkit.Chem.Mol, StructureMapping]
- get_MCS_smarts(mols)#
Helper function to get the maximum common substructure (MCS) SMARTS string for a sequence of Molecular structures for further processing.
- Parameters:
mols (Iterable[rc.Mol]) – The molecular structures to get the maximum common substructre between.
- Returns:
The MCS SMARTS string.
- Return type:
SMARTSstring
- identify_analogs_mappings(mols, smarts='')#
Helper function to generate a maximum common substructure match and from that extract substructure mappings for each of the provided molecules.
If provided a smarts string, the MCS will be skipped and instead an attempt will be made to match smarts against all provided structures.
- Parameters:
mols (Mapping[Hashable | int, rc.Mol]) – The molecular structures to use as a basis for the MCS analysis or for finding the smarts string in if provided.
smarts (shnitsel.filtering.structure_selection.SMARTSstring)
- Returns:
First the resulting (or used) SMARTS string for the structure. Then the Mapping between original keys and the resulting StructureMapping object that can be applied to the original data.
- Return type:
tuple[SMARTSstring, Mapping[Hashable | int, StructureMapping]]
- _list_analogs(ensembles, smarts='', vis=False)#
Extract a common moiety from a selection of ensembles.
By default, this attempts to find the largest possible match using equivalence of any heavy atoms. H-atoms can only match other H-atoms.
- Parameters:
ensembles (Mapping[Hashable | int, xr.DataArray]) – An
Iterableof ``xr.DataArray``s, each containing the geometries of an ensemble of trajectories for a different compound; theysmarts (SMARTSstring, optional) – A SMARTS-string indicating the moiety to cut out of each compound; in each case, the match returned by
rdkit.Chem.Mol.GetSubstrucMatch()(not necessarily the only possible match) will be used; if no SMARTS is provided, a minimal common submol will be extracted usingrdFMCS.FindMCSvis (bool, default=False) – Whether to display a visual indication of the matches.
- Return type:
An
Iterableof ``xr.DataArray``s
- extract_analogs(ensembles: shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) shnitsel.data.tree.node.TreeNode[Any, DatasetOrArray] | None#
- extract_analogs(ensembles: Mapping[Hashable | int, DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) Mapping[Hashable | int, DatasetOrArray] | None
- extract_analogs(ensembles: Sequence[DatasetOrArray], smarts: shnitsel.filtering.structure_selection.SMARTSstring = '', vis: bool = False, *, concat_kws: dict[str, Any] | None = None) Sequence[DatasetOrArray] | None
Combine ensembles for different compounds by finding the moieties they have in common
- Parameters:
ensembles (TreeNode[Any, DatasetOrArray] | Mapping[Hashable | int, DatasetOrArray] | Sequence[DatasetOrArray]) –
Input of Datasets or DataArrays or Shnitsel Wrappers optionally in a tree structure, each containing the geometries of an ensemble of trajectories for a different compound or structure.
- If the ensemble is provided as a tree, the result will be a tree of a mostly identical structure.
A grouping operation may be performed beforehand to avoid different structures to be in the same group.
If the input is a mapping, the keys will be preserved and the mappings will be applied to each entry
If a sequence is provided, the order of inputs will be preserved and the mapping will be applied to each entry in order.
smarts (SMARTSstring) – A SMARTS-string indicating the moiety to cut out of each compound; in each case, the match returned by
rdkit.Chem.Mol.GetSubstructMatch()(not necessarily the only possible match) will be used; if no SMARTS is provided, a minimal common submol will be extracted usingrdFMCS.FindMCSvis (bool, default=False) – Deprecated; Whether to display a visual indication of the match, by default False
**concat_kws – Deprecated; Keyword arguments for internal calls to
xr.concat
- Returns:
TreeNode[Any, DatasetOrArray] – An tree holding the analog substructures in its leaves.
Mapping[Hashable | int, DatasetOrArray]
| Sequence[DatasetOrArray] – Either a mapping or a sequence of xr.Dataset or xr.DataArray of trajectories, holding the mapped inputs from ensembles.
- Raises:
ValueError – If the ensembles provided could not be brought into agreement.
AssertionError – If the tree is not of a suppported format.