This repository provides the database and tools for the paper "Expanding accessible chemical space for fragment-based enumeration by orders of magnitude through optimization of the CReM framework".
It contains the data of modified CrEM library and the original ChEMBL fragment library, along with tools for molecular fragmentation, fragment database management, and structure generation.
- Shaojin Hu
- Qinyu Chen
- Yinhui Yi
- Paul Pilot
- James Xu
- Abir Ganguly
- Albert C. Pan
- Original ChemBL fragment library: http://www.qsar4u.com/pages/crem.php
- Download the oCReM databases:
- ChEMBL22:
- ChEMBL36:
oCReM (optimized Chemical Reassembly) is a framework for fragment-based molecular design that includes:
- Molecular Fragmentation: Breaking down molecules into structural fragments
- Fragment Database Management: Storing and retrieving fragments in SQLite or PostgreSQL databases
- Structure Generation: Generating new molecules through fragment-based assembly
- Mutate: Replace fragments in a molecule
- Grow: Extend a molecule with new fragments
- Link: Connect multiple molecules using linker fragments
oCReM/
├── example/ # Example Jupyter notebooks
│ ├── fragmentation_to_db.ipynb # Fragmentation and database creation example
│ └── structure_generation.ipynb # Structure generation example
├── ta_gen/ # Core functionality
│ ├── bin/ # Command-line tools
│ ├── crem/ # Core CReM implementation
│ ├── db/ # Database management
│ └── utils/ # Utility functions
├── LICENSE # License file
├── README.md # This README
└── environment.yml # Conda environment configuration
First, clone the oCReM repository:
git clone https://github.com/tandemai-inc/oCReM.git
cd oCReMMake sure you have installed Miniconda.
conda env create -f environment.yml
conda activate crem
# Set Python path to include the project root
export PYTHONPATH=$PYTHONPATH:$(pwd)The input file (test.smi) should contain one molecule per line in SMILES format:
CCO
CC(=O)O
CCOC(=O)C
c1ccccc1
Cc1ccccc1
COc1ccccc1
Oc1ccccc1
Nc1ccccc1
NCC(=O)O
CC(=O)Oc1ccccc1C(=O)O
Generate fragments for each molecule in the input file and save them to a CSV file.
python ta_gen/bin/fragmentation.py --input test.smi --out test_frag_1.csv --mode 0 --ncpu 10 --radius 1
python ta_gen/bin/fragmentation.py --input test.smi --out test_frag_2.csv --mode 0 --ncpu 10 --radius 2
python ta_gen/bin/fragmentation.py --input test.smi --out test_frag_3.csv --mode 0 --ncpu 10 --radius 3Generate fragments for each molecule in the input file and save them to a SQLite database. If the SQLite database file does not exist, it will be automatically created.
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 1 --use_db --db_type sqlite --db_path test.db
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 2 --use_db --db_type sqlite --db_path test.db
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 3 --use_db --db_type sqlite --db_path test.dbCreate a configuration file (test.ini) with the following format:
[database]
host=your_host
port=your_port
user=your_username
password=your_password
database=your_databaseGenerate fragments for each molecule in the input file and save them to a PostgreSQL database. If the database does not exist, it will try to create it.
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 1 --use_db --db_type postgres --ini_file test.ini
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 2 --use_db --db_type postgres --ini_file test.ini
python ta_gen/bin/fragmentation.py --input test.smi --mode 0 --ncpu 10 --radius 3 --use_db --db_type postgres --ini_file test.iniFor detailed examples and step-by-step instructions, please refer to the Jupyter notebooks in the example/ directory, especially fragmentation_to_db.ipynb which covers the complete fragmentation process and database storage options.
from rdkit import Chem
from ta_gen.ocrem.ocrem import mutate_mol
from ta_gen.db import create_db_manager
m = Chem.MolFromSmiles('OCCOc1ccccc1')
db_manager = create_db_manager('sqlite', db_path='test.db')
# For PostgreSQL, use:
# db_manager = create_db_manager('postgres', ini_file='replacements.ini')
mols = list(mutate_mol(m, db_manager, max_inc=1))from rdkit import Chem
from ta_gen.ocrem.ocrem import grow_mol
from ta_gen.db import create_db_manager
m = Chem.MolFromSmiles('OCCOc1ccccc1')
db_manager = create_db_manager('sqlite', db_path='test.db')
# For PostgreSQL, use:
# db_manager = create_db_manager('postgres', ini_file='replacements.ini')
mols = list(grow_mol(m, db_manager))from rdkit import Chem
from ta_gen.ocrem.ocrem import link_mols
from ta_gen.db import create_db_manager
m1 = Chem.MolFromSmiles('OC=O')
m2 = Chem.MolFromSmiles('CC=O')
db_manager = create_db_manager('sqlite', db_path='test.db')
# For PostgreSQL, use:
# db_manager = create_db_manager('postgres', ini_file='replacements.ini')
mols = list(link_mols(m1, m2, db_manager, radius=1))- Download the SQLite database tar.gz file (e.g.,
ocrem_chembl36_sqlite.tar.gz) - Extract the tar.gz file:
tar -xzf ocrem_chembl36_sqlite.tar.gzThis will extract to a file named chembl_36.db.
- Use the extracted SQLite database file with the
create_db_managerfunction:
from ta_gen.db import create_db_manager
db_manager = create_db_manager('sqlite', db_path='path/to/chembl_36.db')Note: Make sure you have PostgreSQL installed before proceeding.
- Download the PostgreSQL dump file (e.g.,
ocrem_chembl36_postgres.dump) - Create a new PostgreSQL database:
createdb -U your_username your_database
# With host and port specified:
# createdb -h your_host -p your_port -U your_username your_database- Import the dump file into your database:
pg_restore -U your_username -d your_database ocrem_chembl36_postgres.dump
# With host and port specified:
# pg_restore -h your_host -p your_port -U your_username -d your_database ocrem_chembl36_postgres.dump- Create PostgreSQL database configuration file:
Create a database.ini file with the following format:
[database]
host=your_host
port=your_port
user=your_username
password=your_password
database=your_database- Use it with the
create_db_managerfunction:
from ta_gen.db import create_db_manager
db_manager = create_db_manager('postgres', ini_file='path/to/database.ini')See the Jupyter notebooks in the example/ directory for detailed tutorials:
- fragmentation_to_db.ipynb: Demonstrates how to fragment molecules and store results in a database
- structure_generation.ipynb: Demonstrates how to generate new molecules using the fragment database
This project is licensed under the MIT License - see the LICENSE file for details.
If you use oCReM in your research, please cite the paper:
"Expanding accessible chemical space for fragment-based enumeration by orders of magnitude through optimization of the CReM framework"
For questions or issues, please contact the authors listed above.