shnitsel.geo.geomatch

Attributes

st_yellow

Functions

__match_pattern(mol, smarts)

Find all substructure matches of a SMARTS pattern in a molecule.

__get_bond_info(mol, flagged_tuples)

Extend flagged tuple of bonds, angles or dihedrals by a tuple of bond indices and

__level_to_geo(flag_level)

__get_color_atoms(d_flag, flag_level)

__get_color_bonds(d_flag, flag_level)

__collect_tuples(d_flag)

Select the appropriate list of tuples based on hierarchy:

__get_highlight_molimg(mol, d_flag[, highlight_color, ...])

Convert a list of flagged atom tuples into bonds and atoms for highlighting.

__get_all_atoms(mol)

Return all atoms in the molecule as a dictionary with flags.

__get_atoms_by_indices(match_indices_list, d_atoms)

Set flags for atoms based on whether they are contained within substructure matches.

flag_atoms(mol[, smarts, t_idxs, draw])

Flag atoms in a molecule based on substructure patterns or atom indices.

__get_all_bonds(mol)

Return all bonds in the molecule as a dictionary with flags.

__get_bonds_by_indices(match_indices_list, d_bonds)

Set flags for bonds based on whether they are contained within substructure matches.

flag_bonds(mol[, smarts, t_idxs, draw])

Flag bonds in a molecule based on substructure patterns or atom indices.

__get_all_angles(mol)

Return all angles in the molecule as a dictionary with flags.

__get_angles_by_indices(match_list, d_angles)

Flag angles as active (1) or inactive (0) based on substructure matches.

flag_angles(mol[, smarts, t_idxs, draw])

Flag molecule angles based on SMARTS patterns and/or atom indices.

__get_all_dihedrals(mol)

Return all dihedrals in the molecule as a dictionary with flags.

__get_dihedrals_by_indices(match_list, d_dihedrals)

Flag dihedrals as active (1) if all four atoms (i, j, k, l)

flag_dihedrals(mol[, smarts, t_idxs, draw])

Flag dihedrals in a molecule based on SMARTS and/or atom indices.

flag_bats(mol[, smarts, t_idxs, draw])

Compute and flag bonds, angles, and dihedrals in a single call,

flag_bats_multiple(mol[, l_smarts, l_t_idxs, draw])

Compute and flag bonds, angles, and dihedrals in a single call,

__get_img_multiple_mols(mol, d_multi_flag, l_patterns, ...)

Generate a single SVG image containing multiple copies of a molecule,

__build_conjugated_smarts(n_double[, elems])

Build a SMARTS pattern for a linear conjugated system with n_double

__match_bla_chromophor(mol[, smarts, n_double, elems])

Detect conjugated chromophores defined either by SMARTS, number of

flag_bla_chromophor(mol[, smarts, n_double, elems, ...])

Module Contents

st_yellow
__match_pattern(mol, smarts)

Find all substructure matches of a SMARTS pattern in a molecule.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

  • smarts (str) – SMARTS pattern to search for.

Returns:

Each tuple contains atom indices corresponding to one match of the SMARTS pattern. Returns an empty list if no match is found.

Return type:

list of tuples

__get_bond_info(mol, flagged_tuples)

Extend flagged tuple of bonds, angles or dihedrals by a tuple of bond indices and an information of their respective bond types as double (1.0: SINGLE, 1.5: AROMATIC, 2.0: DOUBLE, and 3.0: TRIPLE)

Parameters:
  • mol (RDKit Mol) – Molecule object.

  • flagged_tuples (list of tuples) – Each entry: (flag, atom_tuple) Example: [(1, (0,1,2,3)), (0, (1,2,3,4))]

Returns:

flagged_tuples_binfo – Each entry: (flag, atom_tuple, bond_tuple, bondtype_tuple, rdkit.Chem.rdchem.Mol of submoli/subgraph ) Example: [(0, (5, 0, 1), (6, 0), (1.0, 2.0), rdkit.Chem.rdchem.Mol object)]

Return type:

list of tuples

__level_to_geo(flag_level)
__get_color_atoms(d_flag, flag_level)
__get_color_bonds(d_flag, flag_level)
__collect_tuples(d_flag)

Select the appropriate list of tuples based on hierarchy: dihedrals -> angles -> bonds.

Each value is a list of tuples of the form:

(flag, atom_tuple, bond_tuple, bondtype_tuple, Mol)

__get_highlight_molimg(mol, d_flag, highlight_color=st_yellow, width=300, height=300)

Convert a list of flagged atom tuples into bonds and atoms for highlighting.

Parameters:
  • mol (RDKit Mol) – Molecule object.

  • d_flag (dictionary) – keys: ‘atoms’, ‘bonds’, ‘angles’ or ‘dihedrals’ values: list of tuples Each entry: (flag, atom_tuple, bond_tuple, bondtype_tuple, Mol) Example: [(1, (0,1), (0), (1.0)), (0, (1,2), (1), (2.0))]

  • highlight_color (tuple, optional) – RGB tuple for highlighting bonds/atoms.

Returns:

img – RDKit molecule image with highlighted atoms and bonds.

Return type:

PIL.Image

__get_all_atoms(mol)

Return all atoms in the molecule as a dictionary with flags.

All atoms are initially flagged as active (1).

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

Returns:

Dictionary with key ‘atoms’ mapping to a list of tuples. Each tuple has the form (flag, (atom_idx, atom_symbol)), where:

  • flagint

    1 indicates active atom.

  • (atom_idx, atom_symbol)tuple

    The atom index in the molecule and its atomic symbol (e.g., (0, ‘C’), (1, ‘H’)).

Return type:

dict

__get_atoms_by_indices(match_indices_list, d_atoms)

Set flags for atoms based on whether they are contained within substructure matches.

Atoms are active (1) if they represent also nodes in the matched substructures. All other atoms are flagged as inactive (0).

Parameters:
  • match_indices_list (list of tuples) – Each tuple contains atom indices (e.g. obtained form substructure match).

  • d_bonds (dict) – Dictionary of all bonds in the molecule (output of __get_all_bonds).

  • d_atoms (dict)

Returns:

Updated atoms dictionary with flags updated based on the substructure matches.

Return type:

dict

flag_atoms(mol, smarts=None, t_idxs=(), draw=False)

Flag atoms in a molecule based on substructure patterns or atom indices.

Atoms can be flagged in four ways:
  1. No SMARTS or target indices provided: all atoms are returned as active.

  2. SMARTS provided: atoms belonging to the substructure matches are active; others are inactive.

  3. Target atom indices provided: only atoms in the provided tuple are active; others are inactive.

  4. Both SMARTS and target indices provided: only the intersection of SMARTS matches and t_idxs are considered active. Warnings are issued if there is no overlap or partial overlap.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

  • smarts (str, optional) – SMARTS pattern to filter atoms. Default is None.

  • t_idxs (tuple, optional) – Tuple of atom indices to filter atoms. Default is empty tuple.

  • draw (bool, optional) – If True, returns an image highlighting the active atoms. Default is True.

Returns:

If draw=False: dictionary with key ‘atoms’ mapping to a list of tuples (flag, (atom_idx, atom_symbol)). If draw=True: tuple of (atom dictionary, highlighted molecule image).

Return type:

dict or tuple

__get_all_bonds(mol)

Return all bonds in the molecule as a dictionary with flags.

All bonds are initially flagged as active (1).

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

Returns:

Dictionary with key ‘bonds’ mapping to a list of tuples. Each tuple has the form (flag, (atom_idx1, atom_idx2)), where:

  • flagint

    0 indicates active bond.

  • (atom_idx1, atom_idx2)tuple

    Pair of atom indices defining the bond.

Return type:

dict

__get_bonds_by_indices(match_indices_list, d_bonds)

Set flags for bonds based on whether they are contained within substructure matches.

Bonds are active (1) if both atoms belong to any of the matched substructures. All other bonds are flagged as inactive (0).

Parameters:
  • match_indices_list (list of tuples) – Each tuple contains atom indices (e.g. obtained form substructure match).

  • d_bonds (dict) – Dictionary of all bonds in the molecule (output of __get_all_bonds).

Returns:

Updated bond dictionary with the same structure as d_bonds, but flags updated based on the substructure matches.

Return type:

dict

flag_bonds(mol, smarts=None, t_idxs=(), draw=True)

Flag bonds in a molecule based on substructure patterns or atom indices.

Bonds can be flagged in three ways:
  1. No SMARTS or target indices provided: all bonds are returned as active.

  2. SMARTS provided: bonds belonging to the substructure matches are active; others are inactive.

  3. Target atom indices provided: bonds entirely contained in the atom index tuple are active; others are inactive.

  4. Both SMARTS and target indices provided: only the intersection of SMARTS matches and t_idxs are considered active. Warnings are issued if there is no overlap or partial overlap.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

  • smarts (str, optional) – SMARTS pattern to filter bonds. Default is None.

  • t_idxs (tuple, optional) – Tuple of atom indices to filter bonds. Default is empty tuple.

Returns:

Dictionary with key ‘bonds’ mapping to a list of tuples (flag, (atom_idx1, atom_idx2)).

Return type:

dict

__get_all_angles(mol)

Return all angles in the molecule as a dictionary with flags.

Angles are defined as atom triples (i, j, k) where i–j and j–k are both bonded.

All angles are initially flagged as active (1).

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

Returns:

Dictionary with key ‘angles’ mapping to a list of tuples: (flag, (i, j, k))

Return type:

dict

__get_angles_by_indices(match_list, d_angles)

Flag angles as active (1) or inactive (0) based on substructure matches.

An angle (i, j, k) is active if all three indices belong to the same match tuple.

Parameters:
  • match_list (list of tuples) – Each tuple contains atom indices of a SMARTS match.

  • d_angles (dict) – Angle dictionary from __get_all_angles().

Returns:

Updated angle dictionary with active/inactive flags.

Return type:

dict

flag_angles(mol, smarts=None, t_idxs=(), draw=False)

Flag molecule angles based on SMARTS patterns and/or atom indices.

Modes of operation

  1. No SMARTS and no t_idxs: return all angles as active.

  2. SMARTS only: angles part of any SMARTS match are active.

  3. t_idxs only: only angles fully inside t_idxs are active.

  4. SMARTS + t_idxs:
    • If SMARTS fails: warn, return angles from t_idxs only.

    • If SMARTS matches but no overlap: warn, return angles from t_idxs.

    • If partial overlap: warn, return only angles in the intersection.

param mol:

RDKit molecule object.

type mol:

rdkit.Chem.rdchem.Mol

param smarts:

SMARTS pattern representing the angle substructure.

type smarts:

str, optional

param t_idxs:

Atom indices to filter angles by.

type t_idxs:

tuple, optional

returns:

Dictionary with key ‘angles’ mapping to list of (flag, (i, j, k)). Active = 1, inactive = 0.

rtype:

dict

Parameters:
  • mol (rdkit.Chem.rdchem.Mol)

  • smarts (str)

  • t_idxs (tuple)

Return type:

dict

__get_all_dihedrals(mol)

Return all dihedrals in the molecule as a dictionary with flags.

Dihedral quadruples are of the form (i, j, k, l) where:

i–j, j–k, and k–l are all bonds.

All dihedrals are initially flagged as active (1).

Parameters:

mol (rdkit.Chem.rdchem.Mol) – RDKit molecule object.

Returns:

Dictionary with key ‘dihedrals’ mapping to a list of: (flag, (i, j, k, l))

Return type:

dict

__get_dihedrals_by_indices(match_list, d_dihedrals)

Flag dihedrals as active (1) if all four atoms (i, j, k, l) belong to the same SMARTS match.

Parameters:
  • match_list (list of tuples) – SMARTS matches from __match_pattern.

  • d_dihedrals (dict) – Output of __get_all_dihedrals().

Returns:

Updated dihedral dictionary with flags.

Return type:

dict

flag_dihedrals(mol, smarts=None, t_idxs=(), draw=False)

Flag dihedrals in a molecule based on SMARTS and/or atom indices.

Modes

  1. No SMARTS + no t_idxs: return all dihedrals active

  2. SMARTS only: dihedrals part of SMARTS matches are active

  3. t_idxs only: dihedrals fully inside t_idxs are active

  4. SMARTS + t_idxs: Find intersection behavior:
    • No SMARTS match: return t_idxs only.

    • No overlap: return t_idxs only.

    • Overlap: return only intersecting dihedrals.

param mol:

Molecule under study.

type mol:

RDKit Mol

param smarts:

SMARTS pattern.

type smarts:

str, optional

param t_idxs:

Atom index tuple for filtering.

type t_idxs:

tuple, optional

returns:

{‘dihedrals’: [(flag, (i,j,k,l)), …]}

rtype:

dict

Parameters:
  • mol (rdkit.Chem.rdchem.Mol)

  • smarts (str)

  • t_idxs (tuple)

Return type:

dict

flag_bats(mol, smarts=None, t_idxs=(), draw=False)

Compute and flag bonds, angles, and dihedrals in a single call, automatically determining which interactions can be filtered based on the size of the SMARTS pattern and/or the number of atom indices supplied in t_idxs.

Rules

  • If SMARTS has:

    2 atoms: only bonds can be filtered 3 atoms: bonds + angles can be filtered >=4 atoms: bonds + angles + dihedrals can be filtered

  • If t_idxs has:

    len=2: only bonds len=3: bonds + angles len>=4: bonds + angles + dihedrals

If both SMARTS and t_idxs are provided, the maximal allowed degree is the minimum of the two limits.

param mol:

Molecule under study.

type mol:

rdkit.Chem.rdchem.Mol

param smarts:

SMARTS pattern for filtering interactions.

type smarts:

str, optional

param t_idxs:

Atom indices for filtering interactions.

type t_idxs:

tuple[int], optional

param draw:

rdkit.Chem.rdchem.Mol object

type draw:

True or False flag for returning flagged

returns:
  • dict

    {

    ‘bonds’: […], ‘angles’: […], ‘dihedrals’: […]

    }

  • Interaction types that cannot be filtered due to SMARTS/t_idx size

  • are returned fully active.

  • if draw (rdkit.Chem.rdchem.Mol object of filtered features)

Parameters:
  • mol (rdkit.Chem.rdchem.Mol)

  • smarts (str)

  • t_idxs (tuple)

Return type:

dict

flag_bats_multiple(mol, l_smarts=None, l_t_idxs=(), draw=False)

Compute and flag bonds, angles, and dihedrals in a single call, ifor multiple structural features

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – Molecule under study.

  • l_smarts (list[str], optional) – SMARTS patterns for filtering interactions.

  • t_idxs (list[tuple[int]], optional) – list of atom indices tuples for filtering interactions.

  • draw (True or False flag for returning flagged) – rdkit.Chem.rdchem.Mol object

  • l_t_idxs (list[tuple])

Returns:

  • dict

    {

    ‘bonds’: […], ‘angles’: […], ‘dihedrals’: […]

    }

  • Interaction types that cannot be filtered due to SMARTS/t_idx size

  • are returned fully active.

  • if draw (rdkit.Chem.rdchem.Mol object of filtered features)

Return type:

dict

__get_img_multiple_mols(mol, d_multi_flag, l_patterns, l_levels)

Generate a single SVG image containing multiple copies of a molecule, each highlighted according to supplied atom/bond pattern levels.

Parameters:
  • mol (rdkit.Chem.rdchem.Mol) – The RDKit molecule object. The same molecule is drawn once for each entry in l_patterns / l_levels.

  • d_multi_flag (dict) –

    Dictionary with mapping information on bonds, angles torsion Each entry: {‘bonds’: [(0, (5, 0, 1), (6, 0), (1.0, 2.0), rdkit.Chem.rdchem.Mol object)]}

    Each entry contains information on the atom indexes (2nd) and bond indexes (3rd) element needed by the helper functions __get_color_atoms() and __get_color_bonds() to extract the atoms and bonds to be highlighted.

  • l_patterns (Iterable) – A list of patterns (smarts or tuples of atom indices) indexing into d_multi_flag.

  • l_levels (Iterable) – List of highlight “levels” corresponding to l_patterns. Each level is passed to the highlight extraction helpers to control the highlight color intensity or style.

Returns:

A grid image produced by rdkit.Chem.Draw.MolsToGridImage, containing all molecule renderings arranged in a single row. Each copy of the molecule is highlighted with its own atom and bond sets determined by the input patterns (smarts or atom indices).

Return type:

PIL.Image.Image or IPython.display.SVG

__build_conjugated_smarts(n_double, elems='#6,#7,#8,#15,#16')

Build a SMARTS pattern for a linear conjugated system with n_double alternating double bonds.

Example (n_double=2): [#6,#7]=[#6,#7]-[#6,#7]=[#6,#7]

Parameters:
  • n_double (int) – Number of C=C-like double bonds.

  • elems (str) – SMARTS atomic specification (default: C,N,O,P,S).

Returns:

SMARTS string encoding the conjugated system.

Return type:

str

__match_bla_chromophor(mol, smarts=None, n_double=None, elems='#6,#7,#8,#15,#16')

Detect conjugated chromophores defined either by SMARTS, number of alternating double bonds, or automatically by maximum extension.

Decision logic

  1. If smarts is given → use SMARTS

  2. If both smarts and n_double are given → validate consistency

  3. If only n_double is given → generate SMARTS

  4. If neither is given → search for maximum conjugated chromophore

param mol:

Molecule to search.

type mol:

rdkit.Chem.Mol

param smarts:

Explicit SMARTS pattern defining the chromophore.

type smarts:

str, optional

param n_double:

Number of alternating double bonds.

type n_double:

int, optional

param elems:

Allowed atoms in conjugation.

type elems:

str

returns:
{

“smarts”: SMARTS used, “n_double”: number of double bonds, “matches”: substructure matches

}

rtype:

dict

Parameters:
  • smarts (str | None)

  • n_double (int | None)

  • elems (str)

flag_bla_chromophor(mol, smarts=None, n_double=None, elems='#6,#7,#8,#15,#16', draw=True, width=500, height=300)
Parameters:
  • smarts (str | None)

  • n_double (int | None)

  • elems (str)

  • width (int)

  • height (int)