Fragment properties¶
You can annotate fragments with molecular properties and then use those properties to constrain which fragments are selected during generation. This is a two-part workflow: add property columns to the database, then filter on them at generation time.
Add properties with cremdb_add_prop¶
cremdb_add_prop -i fragments.db -p mw logp rtb tpsa fcsp3 -c 8
| Option | Default | Description |
|---|---|---|
-i, --input |
— (required) | CReM fragment database |
-p, --properties |
all five | Properties to compute (see below) |
-c, --ncpu |
1 |
Number of CPUs |
--fetch-batch |
50000 |
Rows fetched per batch |
--write-batch |
10000 |
Rows per write |
--imap-chunk |
500 |
Multiprocessing chunk size |
-v, --verbose |
off | Print progress to stderr |
Built-in properties:
| Name | Property |
|---|---|
mw |
molecular weight |
logp |
Crippen logP |
rtb |
rotatable bonds |
tpsa |
topological polar surface area |
fcsp3 |
fraction of sp³ carbons |
The command is schema-aware: in a v1 database it adds columns to the frags
table; in a v0 database it adds them to each radiusN table. It is also
incremental — only rows whose property values are still NULL are
computed, so rerunning after adding new fragments fills just the new rows.
Filter on properties at generation time¶
Pass property ranges as extra keyword arguments to any generation function. A
value may be a single number (exact match) or a (low, high) tuple (inclusive
bounds):
from rdkit import Chem
from crem.crem import mutate_mol
mols = list(mutate_mol(
Chem.MolFromSmiles("c1ccccc1N"),
db_name="fragments.db",
set_names="chembl",
min_freq=5,
mw=(50, 180),
logp=(0.0, 3.5),
))
Where property columns live
The keyword names must correspond to existing columns. In a v1 database
these are columns of frags (or frags_h); in a v0 database they are
columns of the radiusN tables. This is the one generation-time difference
between the two formats.
Add custom properties (Python API)¶
crem.db.add_fragment_props adds the built-in properties and, optionally,
arbitrary custom columns computed by your own functions:
from crem.db import add_fragment_props
from rdkit import Chem
from rdkit.Chem import Descriptors
def num_rings(smi):
return Chem.MolFromSmiles(smi).GetRingInfo().NumRings()
# Built-ins only
add_fragment_props("fragments.db")
# Built-ins plus a custom column
add_fragment_props(
"fragments.db",
properties="all",
custom_props={"nrings": num_rings},
ncpu=8,
)
# Only a custom column, targeting the H-collapsed table
add_fragment_props(
"fragments.db",
custom_props={"heavy": lambda s: Descriptors.HeavyAtomCount(Chem.MolFromSmiles(s))},
table="frags_h",
)
Picklable functions (named functions, functools.partial) run on ncpu
workers; lambdas and closures run serially. Custom properties can target either
frags (default, keyed on core_smi) or frags_h (keyed on the H-capped
SMILES). See crem.db for the full signature.