Molecular operations

grow_mol

Replace hydrogens with fragments from the database.

grow_mol2

Convenience function which can be used to process molecules in parallel using multiprocessing module.

link_mols

Link two molecules by a linker from the database.

link_mols2

Convenience function which can be used to process molecules in parallel using multiprocessing module.

mutate_mol

Generator of new molecules by replacement of fragments in the supplied molecule with fragments from DB.

mutate_mol2

Convenience function which can be used to process molecules in parallel using multiprocessing module.

crem.crem.grow_mol(mol, db_name, radius=3, min_atoms=1, max_atoms=2, max_replacements=None, replace_ids=None, protected_ids=None, min_freq=0, return_rxn=False, return_rxn_freq=False, return_mol=False, ncores=1)

Replace hydrogens with fragments from the database.

Parameters
  • mol – RDKit Mol object. If hydrogens are explicit they will be replaced as well, otherwise not.

  • db_name – path to DB file with fragment replacements.

  • radius – radius of context which will be considered for replacement. Default: 3.

  • min_atoms – minimum number of atoms in the fragment which will replace H

  • max_atoms – maximum number of atoms in the fragment which will replace H

  • max_replacements – maximum number of replacements to make. If the number of replacements available in DB is greater than the specified value the specified number of randomly chosen replacements will be applied. Default: None.

  • replace_ids – iterable with ids of heavy atom with replaceable Hs or/and ids of H atoms to replace, it has lower priority over protected_ids (replace_ids which are present in protected_ids would be protected). Default: None.

  • protected_ids – iterable with hydrogen atom ids or ids of heavy atoms at which hydrogens will not be replaced. Ids of all equivalent atoms should be supplied (e.g. to protect meta-position in toluene ids of both carbons in meta-positions should be supplied). This argument has a higher priority over replace_ids. Default: None.

  • min_freq – minimum occurrence of fragments in DB for replacement. Default: 0.

  • return_rxn – whether to additionally return rxn of a transformation. Default: False.

  • return_rxn_freq – whether to additionally return the frequency of a transformation in the DB. Default: False.

  • return_mol – whether to additionally return RDKit Mol object of a generated molecule. Default: False.

  • ncores – number of cores. Default: 1.

Returns

generator over new molecules. If no additional return arguments were called this would be a generator over SMILES of new molecules. If any of additional return values were asked the function will return a list of list where the first item is SMILES, then rxn string of a transformation (optional), frequency of fragment occurrence in the DB (optional), RDKit Mol object (optional). Only entries with distinct SMILES will be returned.

crem.crem.grow_mol2(*args, **kwargs)

Convenience function which can be used to process molecules in parallel using multiprocessing module. It calls grow_mol which cannot be used directly in multiprocessing because it is a generator

Parameters
  • args – positional arguments, the same as in grow_mol function

  • kwargs – keyword arguments, the same as in grow_mol function

Returns

list with output molecules

Link two molecules by a linker from the database.

Parameters
  • mol1 – the first RDKit Mol object

  • mol2 – the second RDKit Mol object

  • db_name – path to DB file with fragment replacements.

  • radius – radius of context which will be considered for replacement. Default: 3.

  • dist – topological distance between two attachment points in the fragment which will link molecules. Can be a single integer or a tuple of lower and upper bound values.

  • min_atoms – minimum number of heavy atoms in the fragment which will link molecules

  • max_atoms – maximum number of heavy atoms in the fragment which will link molecules

  • max_replacements – maximum number of replacements to make. If the number of replacements available in DB is greater than the specified value the specified number of randomly chosen replacements will be applied. Default: None.

  • replace_ids_1 – iterable with ids of heavy atom of the first molecule with replaceable Hs or/and ids of H atoms to replace, it has lower priority over protected_ids_1 (replace_ids which are present in protected_ids would be protected). Default: None.

  • replace_ids_2 – iterable with ids of heavy atom of the second molecule with replaceable Hs or/and ids of H atoms to replace, it has lower priority over protected_ids_2 (replace_ids which are present in protected_ids would be protected). Default: None.

  • protected_ids_1 – iterable with ids of heavy atoms of the first molecule at which no H replacement should be made and/or ids of protected hydrogens. This argument has a higher priority over replace_ids_1. Default: None.

  • protected_ids_2 – iterable with ids of heavy atoms of the second molecule at which no H replacement should be made and/or ids of protected hydrogens. This argument has a higher priority over replace_ids_2. Default: None.

  • min_freq – minimum occurrence of fragments in DB for replacement. Default: 0.

  • return_rxn – whether to additionally return rxn of a transformation. Default: False.

  • return_rxn_freq – whether to additionally return the frequency of a transformation in the DB. Default: False.

  • return_mol – whether to additionally return RDKit Mol object of a generated molecule. Default: False.

  • ncores – number of cores. Default: 1.

Returns

generator over new molecules. If no additional return arguments were called this would be a generator over SMILES of new molecules. If any of additional return values were asked the function will return a list of list where the first item is SMILES, then rxn string of a transformation (optional), frequency of fragment occurrence in the DB (optional), RDKit Mol object (optional). Only entries with distinct SMILES will be returned.

Convenience function which can be used to process molecules in parallel using multiprocessing module. It calls link_mols which cannot be used directly in multiprocessing because it is a generator

Parameters
  • args – positional arguments, the same as in link_mols function

  • kwargs – keyword arguments, the same as in link_mols function

Returns

list with output molecules

crem.crem.mutate_mol(mol, db_name, radius=3, min_size=0, max_size=10, min_rel_size=0, max_rel_size=1, min_inc=-2, max_inc=2, max_replacements=None, replace_cycles=False, replace_ids=None, protected_ids=None, min_freq=0, return_rxn=False, return_rxn_freq=False, return_mol=False, ncores=1)

Generator of new molecules by replacement of fragments in the supplied molecule with fragments from DB.

Parameters
  • mol – RDKit Mol object. If hydrogens are explicit they will be replaced as well, otherwise not.

  • db_name – path to DB file with fragment replacements.

  • radius – radius of context which will be considered for replacement. Default: 3.

  • min_size – minimum number of heavy atoms in a fragment to replace. If 0 - hydrogens will be replaced (if they are explicit). Default: 0.

  • max_size – maximum number of heavy atoms in a fragment to replace. Default: 10.

  • min_rel_size – minimum relative size of a replaced fragment to the whole molecule (in terms of a number of heavy atoms)

  • max_rel_size – maximum relative size of a replaced fragment to the whole molecule (in terms of a number of heavy atoms)

  • min_inc – minimum change of a number of heavy atoms in replacing fragments to a number of heavy atoms in replaced one. Negative value means that the replacing fragments would be smaller than the replaced one on a specified number of heavy atoms. Default: -2.

  • max_inc – maximum change of a number of heavy atoms in replacing fragments to a number of heavy atoms in replaced one. Default: 2.

  • max_replacements – maximum number of replacements to make. If the number of replacements available in DB is greater than the specified value the specified number of randomly chosen replacements will be applied. Default: None.

  • replace_cycles – looking for replacement of a fragment containing cycles irrespectively of the fragment size. Default: False.

  • replace_ids – iterable with atom ids to replace, it has lower priority over protected_ids (replace_ids which are present in protected_ids would be protected). Ids of hydrogen atoms (if any) connected to the specified heavy atoms will be automatically labeled as replaceable. Default: None.

  • protected_ids – iterable with atom ids which will not be mutated. If the molecule was supplied with explicit hydrogen the ids of protected hydrogens should be supplied as well, otherwise they will be replaced. Ids of all equivalent atoms should be supplied (e.g. to protect meta-position in toluene ids of both carbons in meta-positions should be supplied) This argument has a higher priority over replace_ids. Default: None.

  • min_freq – minimum occurrence of fragments in DB for replacement. Default: 0.

  • return_rxn – whether to additionally return rxn of a transformation. Default: False.

  • return_rxn_freq – whether to additionally return the frequency of a transformation in the DB. Default: False.

  • return_mol – whether to additionally return RDKit Mol object of a generated molecule. Default: False.

  • ncores – number of cores. Default: 1.

Returns

generator over new molecules. If no additional return arguments were called this would be a generator over SMILES of new molecules. If any of additional return values were asked the function will return a list of list where the first item is SMILES, then rxn string of a transformation (optional), frequency of fragment occurrence in the DB (optional), RDKit Mol object (optional). Only entries with distinct SMILES will be returned.

Note: supply RDKit Mol object with explicit hydrogens if H replacement is required

crem.crem.mutate_mol2(*args, **kwargs)

Convenience function which can be used to process molecules in parallel using multiprocessing module. It calls mutate_mol which cannot be used directly in multiprocessing because it is a generator

Parameters
  • args – positional arguments, the same as in mutate_mol function

  • kwargs – keyword arguments, the same as in mutate_mol function

Returns

list with output molecules