Multiprocessing¶
There are two independent ways to parallelize CReM.
Parallelism within one molecule: ncores¶
Every generation function takes an ncores argument. It parallelizes the
replacements made within a single molecule across that many processes:
from rdkit import Chem
from crem.crem import mutate_mol
m = Chem.MolFromSmiles("c1cc(OC)ccc1C")
mols = list(mutate_mol(m, db_name="fragments.db", max_size=1, ncores=8))
Parallelism across many molecules: the *2 wrappers¶
mutate_mol, grow_mol, link_mols, and make_cycle are generators.
Generators cannot be pickled, so they can't be passed directly to
multiprocessing.Pool. For that case CReM provides list-returning wrappers that
take exactly the same arguments and are picklable:
| Generator | List-returning wrapper |
|---|---|
mutate_mol |
mutate_mol2 |
grow_mol |
grow_mol2 |
link_mols |
link_mols2 |
make_cycle |
make_cycle2 |
Use them with multiprocessing.Pool to process several molecules at once:
from multiprocessing import Pool
from functools import partial
from rdkit import Chem
from crem.crem import mutate_mol2
input_smi = ["c1ccccc1N", "NCC(=O)OC", "NCCCO"]
input_mols = [Chem.MolFromSmiles(s) for s in input_smi]
with Pool(2) as p:
res = list(p.imap(
partial(mutate_mol2, db_name="fragments.db", max_size=1),
input_mols,
))
# res is a list (one per input molecule) of lists of product SMILES
When parallelizing across molecules this way, keep ncores=1 (the default) in
each wrapper call so you don't oversubscribe the CPUs — let the outer Pool
own the parallelism.
Tip
crem.utils.enumerate_compounds already parallelizes across molecules
internally (via joblib), so you don't need to wrap it yourself — see
Iterative enumeration.