SMARTDock (Scoring with Machine learning and Activity for Ranking Targeted Docking) is an integrated, automated workflow designed to enhance virtual screening in structure-based drug discovery. It develops target-specific scoring functions (STSF) by integrating publicly available bioactivity data (ChEMBL), molecular docking (using GOLD), and machine learning (ML) classification models.
The workflow uses the PADIF (Protein per Atom Score Contributions Derived Interaction Fingerprint) representation to encode key protein-ligand interactions and classify likely binders.
To run SMARTDock, you must have the following installed locally, as the Docker container relies on mounting and accessing these files:
- Docker: Required for running the containerized workflow.
- GOLD Docking Software: A local installation of the GOLD software (part of the CCDC suite).
- CCDC License: An active license key is required for the CCDC Python API and GOLD to function within the container.
Clone the project repository and navigate into the directory:
git clone https://github.com/kochgroup/smartdock.git
cd smartdockYou must edit the compose.yml file to match your local CCDC installation paths.
-
Mount Path: Change the local volume path
/appl/ccdc/to the actual path of your CCDC installation directory.- Original:
- /appl/ccdc/:/mnt/ccdc - Action: Replace
/appl/ccdc/with your CCDC root directory path.
- Original:
-
Environment Variables: Verify that the
CSDHOME,GOLD_DIR, and other CCDC environment variables match the version and paths of your mounted CCDC installation.- Example based on your file: The current paths are set for a
CSDS2020installation. Adjust2020if you use a different version.environment: - CSDHOME=/mnt/ccdc/CSDS2020/CSD_2020 - GOLD_DIR=/mnt/ccdc/CSDS2020/Discovery_2020/GOLD # ... other CCDC variables
- Example based on your file: The current paths are set for a
Edit the padif_start.sh file to use your actual CCDC license key.
-
Action: Replace the placeholder key
YOUR_CCDC_LICENSE_KEY_HEREwith your valid CCDC license key.# Replace the key below with your valid CCDC license key docker exec -it padif_app /bin/bash -c "/mnt/ccdc/CSDS2020/CSD_2020/bin/ccdc_activator -a -k YOUR_CCDC_LICENSE_KEY_HERE"
Make the start script executable and run it to build the image (if necessary) and launch the container:
chmod +x padif_start.sh
./padif_start.shThe script will:
- Run
docker compose up -dto build thepadif_appimage and start the container in the background. - Execute the CCDC activator command with your license key inside the container.
- Attach you to a bash session inside the running container, placed in the
/code/workdirectory, ready to run the SMARTDock script.
Once inside the container's shell, you can execute the main SMARTDock script (assuming the entry script is named smartdock.py or similar).
Ensure your input files are placed in the local ./data folder before starting the container, as this folder is mounted to /code/work/data inside the container.
The workflow is typically executed with a single command providing the necessary inputs:
# Example command (adjust script name and arguments as needed)
python3 /code/src/smartdock_workflow.py \
--chembl-id CHEMBL5567 \
--pdb-file ./data/3IES.pdb \
--smiles-list ./data/test_compounds.smi \
--model XGBoost \
--splitting scaffold--chembl-id: The ChEMBL ID of the target protein.--pdb-file: Path to the PDB file of the protein structure (should be placed in the mounted./datafolder).--smiles-list: Path to a file containing test compounds (SMILES, ID format) for external screening.
| Parameter | Options | Description |
|---|---|---|
--model |
RF, XGBoost, MLP | Select the machine learning algorithm(s) for the scoring function. |
--splitting |
random, scaffold | Select the data splitting method for model validation and training. |
--search-efficiency |
Integer (e.g., 200) | GOLD search efficiency parameter. |
--ligand-id |
String (e.g., 'LIG') | Co-crystallized ligand ID to define the binding site. |
The Docker image is built on Python 3.9. Key dependencies installed via the requirements.txt file include:
deepchemchembl_webresource_clientrdkitpycaretxgboostcsd-python-api(installed via a custom CCDC index)