Skip to content

Latest commit

 

History

History
675 lines (544 loc) · 25.5 KB

File metadata and controls

675 lines (544 loc) · 25.5 KB

Boltz2 Notebook - Advanced Biomolecular Structure Prediction & Affinity Analysis

Boltz2 Notebook Banner

AI-powered biomolecular structure prediction and binding affinity analysis — Interactive Jupyter notebooks for protein structure prediction, protein-ligand binding, and multi-entity complex modeling using the Boltz2 diffusion model. No local GPU installation required.

Python CUDA Boltz2 Platform License Status


Quick Start

Launch in Google Colab (Free GPU):

Note: Google Colab provides free GPU access (T4 GPU). For best performance, select GPU runtime: Runtime → Change runtime type → GPU

Version Features Status Launch
V2.0.0 Multi-entity support, DNA/RNA, Advanced constraints, Cyclic peptides, PTMs Beta Open In Colab
V1.0.0 Multi-chain proteins, Protein-ligand binding, Affinity analysis Stable Open In Colab
Batch v1.0 Batch processing, CSV/FASTA inputs, High-throughput screening Pre-release Open In Colab

Key Features

  • Protein Structure Prediction — Diffusion-based AI modeling for single proteins and multi-chain complexes
  • Protein-Ligand Binding — Predict and score molecular interactions with affinity estimation
  • Multi-Entity Support — Handle proteins, DNA/RNA, ligands, and custom modifications simultaneously
  • Advanced Constraints — Covalent bonds, binding pocket conditioning, contact constraints, template guidance
  • Confidence Metrics — Per-residue confidence (pLDDT), Predicted Aligned Error (PAE), affinity predictions
  • Interactive Visualization — 3D structure viewer with confidence overlays and binding analysis dashboard
  • GPU Acceleration — CUDA-enabled with free T4 GPU access in Google Colab
  • Zero Installation — Runs entirely in Google Colab (no local GPU setup required)

Feature Comparison

Feature V1.0.0 V2.0.0 Batch
Single protein prediction
Protein-ligand binding
Multi-chain complexes
DNA/RNA support
Template guidance
Custom MSA upload
Post-translational modifications
Custom constraints
Covalent bonds
Cyclic peptides
Affinity prediction
Batch processing
CSV/FASTA input
3D visualization

Notebook Selection Guide

  • New users or standard predictions? → Use V1.0.0 (stable, battle-tested)
  • Advanced modeling needs? → Use V2.0.0 (latest features, DNA/RNA, constraints)
  • Large-scale screening? → Use Batch v1.0 (automated high-throughput)

Available Notebooks

Notebook Status Best For
V2.0.0 Beta Advanced modeling, multi-entity complexes, structure-guided design
V1.0.0 Stable Production predictions, protein-ligand binding, standard analysis
Batch v1.0 Pre-release High-throughput screening, batch processing, automation
Parameter Generator Utility Custom configuration and input building


Architecture & Workflow

┌─────────────────────────────────────────────────────────────┐
│ STEP 1: Setup Environment & Dependencies                    │
│ - Install Boltz2, PyTorch, CUDA support                     │
│ - Initialize workspace directories                          │
│ - Google Drive integration (optional)                       │
└────────────────┬────────────────────────────────────────────┘
                 ↓
┌─────────────────────────────────────────────────────────────┐
│ STEP 2: Input Builder (param_gen.py)                        │
│ - Define protein/DNA/RNA sequences                          │
│ - Add ligands (SMILES or CCD code)                          │
│ - Upload templates & custom MSA files                       │
│ - Define constraints (bonds, pockets, contacts)             │
│ - Generate params.yaml & run_params.txt                     │
└────────────────┬────────────────────────────────────────────┘
                 ↓
┌─────────────────────────────────────────────────────────────┐
│ STEP 3: Boltz2 Execution (Boltz_Run.py)                     │
│ - MSA generation (online or pre-computed)                   │
│ - Diffusion-based structure prediction                      │
│ - Recycling steps for refinement                            │
│ - Generate PDB/CIF models                                   │
│ - Compute confidence scores                                 │
└────────────────┬────────────────────────────────────────────┘
                 ↓
┌─────────────────────────────────────────────────────────────┐
│ STEP 4: Analysis & Visualization (analysis.py)              │
│ - Extract pLDDT (per-residue confidence)                    │
│ - Extract PAE (predicted aligned error)                     │
│ - Compute affinity predictions                              │
│ - Generate confidence plots                                 │
│ - Create interactive 3D viewer                              │
│ - Display hit discovery dashboard                           │
└─────────────────────────────────────────────────────────────┘

Typical Time: 2-10 minutes on T4 GPU (depending on complexity)


Execution Pipeline

  1. Parameter Loading - Reads run_params.txt with job settings
  2. Directory Setup - Creates output folder structure
  3. MSA Generation - Fetches homologous sequences (unless pre-provided)
  4. Boltz2 Command - Constructs and runs prediction:
    boltz predict params.yaml \
      --out_dir job_name \
      --recycling_steps 3 \
      --sampling_steps 200 \
      --diffusion_samples 1 \
      --step_scale 1.638 \
      --max_msa_seqs 8192 \
      --msa_pairing_strategy unpaired_paired \
      --use_msa_server
  5. Output Generation - Creates PDB/CIF files with predictions
  6. Visualization - Extracts data for 3D rendering

Configuration Parameters:

Parameter Default Range Meaning
recycling_steps 3 1-10 Model refinement iterations
sampling_steps 200 50-500 Diffusion sampling iterations
diffusion_samples 1 1-10 Number of structure samples per job
step_scale 1.638 0.5-2.0 Scaling for diffusion steps
max_msa_seqs 8192 256-8192 Max homolog sequences
msa_pairing_strategy unpaired_paired paired / unpaired / greedy MSA alignment strategy
use_potentials False True/False Physics-based refinement
override False True/False Re-run even if output exists


Usage Examples

Basic Protein Structure

Input: Single protein sequence
Output: PDB file + confidence metrics
Time: ~2-5 min

Protein-Ligand Complex with Affinity

Input: Protein sequence + ligand SMILES
Output: Complex structure + binding affinity + hit discovery dashboard
Time: ~3-7 min

Multi-Chain Complex (e.g., Antibody-Antigen)

Input: 2-3 protein sequences
Output: Full complex structure + interface metrics (PAE, pLDDT)
Time: ~5-10 min

Template-Guided Modeling

Input: Protein variant + reference PDB template
Output: Faster convergence + higher confidence in conserved regions
Time: ~2-5 min

Troubleshooting

Issue Cause Solution
"CUDA out of memory" Sequence too long Reduce max_msa_seqs to 256, use fast profile
"MSA generation failed" Server/connectivity issue Use custom MSA file instead of server
"Invalid SMILES" Malformed ligand notation Validate SMILES at chemspider.com
"No model file found" Boltz2 crashed Check output logs, verify params.yaml
"pLDDT all low (<50)" Novel/ambiguous fold Try template guidance if available
Low affinity confidence Complex interface Increase sampling_steps to 500

Documentation


Citation

If you use Boltz2-Notebook, please cite:

@article{Passaro2025,
  title={Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction},
  author={Passaro, S. and Corso, G. and Wohlwend, J. and others},
  journal={bioRxiv},
  year={2025}
}

@article{Wohlwend2024,
  title={Boltz-1: Democratizing Biomolecular Interaction Modeling},
  author={Wohlwend, J. and Corso, G. and Passaro, S. and others},
  journal={bioRxiv},
  year={2024}
}

Credits & Links


Getting Help


Last Updated: April 2026 | Version: 2.0.0 (Latest) | Status: ✅ Production Ready