Fragment properties¶

You can annotate fragments with molecular properties and then use those properties to constrain which fragments are selected during generation. This is a two-part workflow: add property columns to the database, then filter on them at generation time.

Add properties with `cremdb_add_prop`¶

cremdb_add_prop -i fragments.db -p mw logp rtb tpsa fcsp3 -c 8

Option	Default	Description
`-i`, `--input`	— (required)	CReM fragment database
`-p`, `--properties`	all five	Properties to compute (see below)
`-c`, `--ncpu`	`1`	Number of CPUs
`--fetch-batch`	`50000`	Rows fetched per batch
`--write-batch`	`10000`	Rows per write
`--imap-chunk`	`500`	Multiprocessing chunk size
`-v`, `--verbose`	off	Print progress to stderr

Built-in properties:

Name	Property
`mw`	molecular weight
`logp`	Crippen logP
`rtb`	rotatable bonds
`tpsa`	topological polar surface area
`fcsp3`	fraction of sp³ carbons

The command is schema-aware: in a v1 database it adds columns to the frags table; in a v0 database it adds them to each radiusN table. It is also incremental — only rows whose property values are still NULL are computed, so rerunning after adding new fragments fills just the new rows.

Filter on properties at generation time¶

Pass property ranges as extra keyword arguments to any generation function. A value may be a single number (exact match) or a (low, high) tuple (inclusive bounds):

from rdkit import Chem
from crem.crem import mutate_mol

mols = list(mutate_mol(
    Chem.MolFromSmiles("c1ccccc1N"),
    db_name="fragments.db",
    set_names="chembl",
    min_freq=5,
    mw=(50, 180),
    logp=(0.0, 3.5),
))

Where property columns live

The keyword names must correspond to existing columns. In a v1 database these are columns of frags (or frags_h); in a v0 database they are columns of the radiusN tables. This is the one generation-time difference between the two formats.

Add custom properties (Python API)¶

crem.db.add_fragment_props adds the built-in properties and, optionally, arbitrary custom columns computed by your own functions:

from crem.db import add_fragment_props
from rdkit import Chem
from rdkit.Chem import Descriptors

def num_rings(smi):
    return Chem.MolFromSmiles(smi).GetRingInfo().NumRings()

# Built-ins only
add_fragment_props("fragments.db")

# Built-ins plus a custom column
add_fragment_props(
    "fragments.db",
    properties="all",
    custom_props={"nrings": num_rings},
    ncpu=8,
)

# Only a custom column, targeting the H-collapsed table
add_fragment_props(
    "fragments.db",
    custom_props={"heavy": lambda s: Descriptors.HeavyAtomCount(Chem.MolFromSmiles(s))},
    table="frags_h",
)

Picklable functions (named functions, functools.partial) run on ncpu workers; lambdas and closures run serially. Custom properties can target either frags (default, keyed on core_smi) or frags_h (keyed on the H-capped SMILES). See crem.db for the full signature.

Fragment properties¶

Add properties with cremdb_add_prop¶

Filter on properties at generation time¶

Add custom properties (Python API)¶

Add properties with `cremdb_add_prop`¶