shnitsel.geo.geomatch ===================== .. py:module:: shnitsel.geo.geomatch Attributes ---------- .. autoapisummary:: shnitsel.geo.geomatch.st_yellow Functions --------- .. autoapisummary:: shnitsel.geo.geomatch.__match_pattern shnitsel.geo.geomatch.__get_bond_info shnitsel.geo.geomatch.__level_to_geo shnitsel.geo.geomatch.__get_color_atoms shnitsel.geo.geomatch.__get_color_bonds shnitsel.geo.geomatch.__collect_tuples shnitsel.geo.geomatch.__get_highlight_molimg shnitsel.geo.geomatch.__get_all_atoms shnitsel.geo.geomatch.__get_atoms_by_indices shnitsel.geo.geomatch.flag_atoms shnitsel.geo.geomatch.__get_all_bonds shnitsel.geo.geomatch.__get_bonds_by_indices shnitsel.geo.geomatch.flag_bonds shnitsel.geo.geomatch.__get_all_angles shnitsel.geo.geomatch.__get_angles_by_indices shnitsel.geo.geomatch.flag_angles shnitsel.geo.geomatch.__get_all_dihedrals shnitsel.geo.geomatch.__get_dihedrals_by_indices shnitsel.geo.geomatch.flag_dihedrals shnitsel.geo.geomatch.flag_bats shnitsel.geo.geomatch.flag_bats_multiple shnitsel.geo.geomatch.__get_img_multiple_mols shnitsel.geo.geomatch.__build_conjugated_smarts shnitsel.geo.geomatch.__match_bla_chromophor shnitsel.geo.geomatch.flag_bla_chromophor Module Contents --------------- .. py:data:: st_yellow .. py:function:: __match_pattern(mol, smarts) Find all substructure matches of a SMARTS pattern in a molecule. :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :param smarts: SMARTS pattern to search for. :type smarts: str :returns: Each tuple contains atom indices corresponding to one match of the SMARTS pattern. Returns an empty list if no match is found. :rtype: list of tuples .. py:function:: __get_bond_info(mol, flagged_tuples) Extend flagged tuple of bonds, angles or dihedrals by a tuple of bond indices and an information of their respective bond types as double (1.0: SINGLE, 1.5: AROMATIC, 2.0: DOUBLE, and 3.0: TRIPLE) :param mol: Molecule object. :type mol: RDKit Mol :param flagged_tuples: Each entry: (flag, atom_tuple) Example: [(1, (0,1,2,3)), (0, (1,2,3,4))] :type flagged_tuples: list of tuples :returns: **flagged_tuples_binfo** -- Each entry: (flag, atom_tuple, bond_tuple, bondtype_tuple, rdkit.Chem.rdchem.Mol of submoli/subgraph ) Example: [(0, (5, 0, 1), (6, 0), (1.0, 2.0), rdkit.Chem.rdchem.Mol object)] :rtype: list of tuples .. py:function:: __level_to_geo(flag_level) .. py:function:: __get_color_atoms(d_flag, flag_level) .. py:function:: __get_color_bonds(d_flag, flag_level) .. py:function:: __collect_tuples(d_flag) Select the appropriate list of tuples based on hierarchy: dihedrals -> angles -> bonds. Each value is a list of tuples of the form: (flag, atom_tuple, bond_tuple, bondtype_tuple, Mol) .. py:function:: __get_highlight_molimg(mol, d_flag, highlight_color=st_yellow, width=300, height=300) Convert a list of flagged atom tuples into bonds and atoms for highlighting. :param mol: Molecule object. :type mol: RDKit Mol :param d_flag: keys: 'atoms', 'bonds', 'angles' or 'dihedrals' values: list of tuples Each entry: (flag, atom_tuple, bond_tuple, bondtype_tuple, Mol) Example: [(1, (0,1), (0), (1.0)), (0, (1,2), (1), (2.0))] :type d_flag: dictionary :param highlight_color: RGB tuple for highlighting bonds/atoms. :type highlight_color: tuple, optional :returns: **img** -- RDKit molecule image with highlighted atoms and bonds. :rtype: PIL.Image .. py:function:: __get_all_atoms(mol) Return all atoms in the molecule as a dictionary with flags. All atoms are initially flagged as active (1). :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :returns: Dictionary with key 'atoms' mapping to a list of tuples. Each tuple has the form `(flag, (atom_idx, atom_symbol))`, where: - flag : int 1 indicates active atom. - (atom_idx, atom_symbol) : tuple The atom index in the molecule and its atomic symbol (e.g., (0, 'C'), (1, 'H')). :rtype: dict .. py:function:: __get_atoms_by_indices(match_indices_list, d_atoms) Set flags for atoms based on whether they are contained within substructure matches. Atoms are active (1) if they represent also nodes in the matched substructures. All other atoms are flagged as inactive (0). :param match_indices_list: Each tuple contains atom indices (e.g. obtained form substructure match). :type match_indices_list: list of tuples :param d_bonds: Dictionary of all bonds in the molecule (output of `__get_all_bonds`). :type d_bonds: dict :returns: Updated atoms dictionary with flags updated based on the substructure matches. :rtype: dict .. py:function:: flag_atoms(mol, smarts = None, t_idxs = (), draw=False) Flag atoms in a molecule based on substructure patterns or atom indices. Atoms can be flagged in four ways: 1. No SMARTS or target indices provided: all atoms are returned as active. 2. SMARTS provided: atoms belonging to the substructure matches are active; others are inactive. 3. Target atom indices provided: only atoms in the provided tuple are active; others are inactive. 4. Both SMARTS and target indices provided: only the intersection of SMARTS matches and t_idxs are considered active. Warnings are issued if there is no overlap or partial overlap. :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :param smarts: SMARTS pattern to filter atoms. Default is None. :type smarts: str, optional :param t_idxs: Tuple of atom indices to filter atoms. Default is empty tuple. :type t_idxs: tuple, optional :param draw: If True, returns an image highlighting the active atoms. Default is True. :type draw: bool, optional :returns: If draw=False: dictionary with key 'atoms' mapping to a list of tuples `(flag, (atom_idx, atom_symbol))`. If draw=True: tuple of (atom dictionary, highlighted molecule image). :rtype: dict or tuple .. py:function:: __get_all_bonds(mol) Return all bonds in the molecule as a dictionary with flags. All bonds are initially flagged as active (1). :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :returns: Dictionary with key 'bonds' mapping to a list of tuples. Each tuple has the form `(flag, (atom_idx1, atom_idx2))`, where: - flag : int 0 indicates active bond. - (atom_idx1, atom_idx2) : tuple Pair of atom indices defining the bond. :rtype: dict .. py:function:: __get_bonds_by_indices(match_indices_list, d_bonds) Set flags for bonds based on whether they are contained within substructure matches. Bonds are active (1) if both atoms belong to any of the matched substructures. All other bonds are flagged as inactive (0). :param match_indices_list: Each tuple contains atom indices (e.g. obtained form substructure match). :type match_indices_list: list of tuples :param d_bonds: Dictionary of all bonds in the molecule (output of `__get_all_bonds`). :type d_bonds: dict :returns: Updated bond dictionary with the same structure as `d_bonds`, but flags updated based on the substructure matches. :rtype: dict .. py:function:: flag_bonds(mol, smarts = None, t_idxs = (), draw=True) Flag bonds in a molecule based on substructure patterns or atom indices. Bonds can be flagged in three ways: 1. No SMARTS or target indices provided: all bonds are returned as active. 2. SMARTS provided: bonds belonging to the substructure matches are active; others are inactive. 3. Target atom indices provided: bonds entirely contained in the atom index tuple are active; others are inactive. 4. Both SMARTS and target indices provided: only the intersection of SMARTS matches and t_idxs are considered active. Warnings are issued if there is no overlap or partial overlap. :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :param smarts: SMARTS pattern to filter bonds. Default is None. :type smarts: str, optional :param t_idxs: Tuple of atom indices to filter bonds. Default is empty tuple. :type t_idxs: tuple, optional :returns: Dictionary with key 'bonds' mapping to a list of tuples `(flag, (atom_idx1, atom_idx2))`. :rtype: dict .. py:function:: __get_all_angles(mol) Return all angles in the molecule as a dictionary with flags. Angles are defined as atom triples (i, j, k) where i–j and j–k are both bonded. All angles are initially flagged as active (1). :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :returns: Dictionary with key 'angles' mapping to a list of tuples: (flag, (i, j, k)) :rtype: dict .. py:function:: __get_angles_by_indices(match_list, d_angles) Flag angles as active (1) or inactive (0) based on substructure matches. An angle (i, j, k) is active if all three indices belong to the *same* match tuple. :param match_list: Each tuple contains atom indices of a SMARTS match. :type match_list: list of tuples :param d_angles: Angle dictionary from __get_all_angles(). :type d_angles: dict :returns: Updated angle dictionary with active/inactive flags. :rtype: dict .. py:function:: flag_angles(mol, smarts = None, t_idxs = (), draw=False) Flag molecule angles based on SMARTS patterns and/or atom indices. Modes of operation ------------------ 1) No SMARTS and no t_idxs: return all angles as active. 2) SMARTS only: angles part of any SMARTS match are active. 3) t_idxs only: only angles fully inside t_idxs are active. 4) SMARTS + t_idxs: - If SMARTS fails: warn, return angles from t_idxs only. - If SMARTS matches but no overlap: warn, return angles from t_idxs. - If partial overlap: warn, return only angles in the intersection. :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :param smarts: SMARTS pattern representing the angle substructure. :type smarts: str, optional :param t_idxs: Atom indices to filter angles by. :type t_idxs: tuple, optional :returns: Dictionary with key 'angles' mapping to list of (flag, (i, j, k)). Active = 1, inactive = 0. :rtype: dict .. py:function:: __get_all_dihedrals(mol) Return all dihedrals in the molecule as a dictionary with flags. Dihedral quadruples are of the form (i, j, k, l) where: i–j, j–k, and k–l are all bonds. All dihedrals are initially flagged as active (1). :param mol: RDKit molecule object. :type mol: rdkit.Chem.rdchem.Mol :returns: Dictionary with key 'dihedrals' mapping to a list of: (flag, (i, j, k, l)) :rtype: dict .. py:function:: __get_dihedrals_by_indices(match_list, d_dihedrals) Flag dihedrals as active (1) if all four atoms (i, j, k, l) belong to the same SMARTS match. :param match_list: SMARTS matches from __match_pattern. :type match_list: list of tuples :param d_dihedrals: Output of __get_all_dihedrals(). :type d_dihedrals: dict :returns: Updated dihedral dictionary with flags. :rtype: dict .. py:function:: flag_dihedrals(mol, smarts = None, t_idxs = (), draw=False) Flag dihedrals in a molecule based on SMARTS and/or atom indices. Modes ----- 1) No SMARTS + no t_idxs: return all dihedrals active 2) SMARTS only: dihedrals part of SMARTS matches are active 3) t_idxs only: dihedrals fully inside t_idxs are active 4) SMARTS + t_idxs: Find intersection behavior: - No SMARTS match: return t_idxs only. - No overlap: return t_idxs only. - Overlap: return only intersecting dihedrals. :param mol: Molecule under study. :type mol: RDKit Mol :param smarts: SMARTS pattern. :type smarts: str, optional :param t_idxs: Atom index tuple for filtering. :type t_idxs: tuple, optional :returns: {'dihedrals': [(flag, (i,j,k,l)), ...]} :rtype: dict .. py:function:: flag_bats(mol, smarts = None, t_idxs = (), draw=False) Compute and flag bonds, angles, and dihedrals in a single call, automatically determining which interactions can be filtered based on the size of the SMARTS pattern and/or the number of atom indices supplied in t_idxs. Rules ----- - If SMARTS has: 2 atoms: only bonds can be filtered 3 atoms: bonds + angles can be filtered >=4 atoms: bonds + angles + dihedrals can be filtered - If t_idxs has: len=2: only bonds len=3: bonds + angles len>=4: bonds + angles + dihedrals If both SMARTS and t_idxs are provided, the *maximal allowed degree* is the minimum of the two limits. :param mol: Molecule under study. :type mol: rdkit.Chem.rdchem.Mol :param smarts: SMARTS pattern for filtering interactions. :type smarts: str, optional :param t_idxs: Atom indices for filtering interactions. :type t_idxs: tuple[int], optional :param draw: rdkit.Chem.rdchem.Mol object :type draw: True or False flag for returning flagged :returns: * *dict* -- { 'bonds': [...], 'angles': [...], 'dihedrals': [...] } * *Interaction types that cannot be filtered due to SMARTS/t_idx size* * *are returned fully active.* * **if draw** (*rdkit.Chem.rdchem.Mol object of filtered features*) .. py:function:: flag_bats_multiple(mol, l_smarts = None, l_t_idxs = (), draw=False) Compute and flag bonds, angles, and dihedrals in a single call, ifor multiple structural features :param mol: Molecule under study. :type mol: rdkit.Chem.rdchem.Mol :param l_smarts: SMARTS patterns for filtering interactions. :type l_smarts: list[str], optional :param t_idxs: list of atom indices tuples for filtering interactions. :type t_idxs: list[tuple[int]], optional :param draw: rdkit.Chem.rdchem.Mol object :type draw: True or False flag for returning flagged :returns: * *dict* -- { 'bonds': [...], 'angles': [...], 'dihedrals': [...] } * *Interaction types that cannot be filtered due to SMARTS/t_idx size* * *are returned fully active.* * **if draw** (*rdkit.Chem.rdchem.Mol object of filtered features*) .. py:function:: __get_img_multiple_mols(mol, d_multi_flag, l_patterns, l_levels) Generate a single SVG image containing multiple copies of a molecule, each highlighted according to supplied atom/bond pattern levels. :param mol: The RDKit molecule object. The same molecule is drawn once for each entry in `l_patterns` / `l_levels`. :type mol: rdkit.Chem.rdchem.Mol :param d_multi_flag: Dictionary with mapping information on bonds, angles torsion Each entry: {'bonds': [(0, (5, 0, 1), (6, 0), (1.0, 2.0), rdkit.Chem.rdchem.Mol object)]} Each entry contains information on the atom indexes (2nd) and bond indexes (3rd) element needed by the helper functions `__get_color_atoms()` and `__get_color_bonds()` to extract the atoms and bonds to be highlighted. :type d_multi_flag: dict :param l_patterns: A list of patterns (smarts or tuples of atom indices) indexing into `d_multi_flag`. :type l_patterns: Iterable :param l_levels: List of highlight “levels” corresponding to `l_patterns`. Each level is passed to the highlight extraction helpers to control the highlight color intensity or style. :type l_levels: Iterable :returns: A grid image produced by `rdkit.Chem.Draw.MolsToGridImage`, containing all molecule renderings arranged in a single row. Each copy of the molecule is highlighted with its own atom and bond sets determined by the input patterns (smarts or atom indices). :rtype: PIL.Image.Image or IPython.display.SVG .. py:function:: __build_conjugated_smarts(n_double, elems = '#6,#7,#8,#15,#16') Build a SMARTS pattern for a linear conjugated system with `n_double` alternating double bonds. Example (n_double=2): [#6,#7]=[#6,#7]-[#6,#7]=[#6,#7] :param n_double: Number of C=C-like double bonds. :type n_double: int :param elems: SMARTS atomic specification (default: C,N,O,P,S). :type elems: str :returns: SMARTS string encoding the conjugated system. :rtype: str .. py:function:: __match_bla_chromophor(mol, smarts = None, n_double = None, elems = '#6,#7,#8,#15,#16') Detect conjugated chromophores defined either by SMARTS, number of alternating double bonds, or automatically by maximum extension. Decision logic -------------- 1. If `smarts` is given → use SMARTS 2. If both `smarts` and `n_double` are given → validate consistency 3. If only `n_double` is given → generate SMARTS 4. If neither is given → search for maximum conjugated chromophore :param mol: Molecule to search. :type mol: rdkit.Chem.Mol :param smarts: Explicit SMARTS pattern defining the chromophore. :type smarts: str, optional :param n_double: Number of alternating double bonds. :type n_double: int, optional :param elems: Allowed atoms in conjugation. :type elems: str :returns: { "smarts": SMARTS used, "n_double": number of double bonds, "matches": substructure matches } :rtype: dict .. py:function:: flag_bla_chromophor(mol, smarts = None, n_double = None, elems = '#6,#7,#8,#15,#16', draw=True, width = 500, height = 300)