A sophisticated AI-powered chemistry assistant for material science and chemistry research, featuring specialized agents, multi-platform automation, and comprehensive chemical knowledge integration.
Catalyze is a cutting-edge AI chemistry assistant that combines the power of specialized AI agents with comprehensive chemical databases to provide intelligent, context-aware assistance for chemistry research, protocol generation, lab automation, and safety analysis. Built with a sophisticated agent-based architecture, Catalyze offers multi-platform automation support, PDF analysis capabilities, and seamless integration with ChEMBL's extensive chemical database.
- π€ Multi-Agent Architecture: 5 specialized AI agents for different chemistry tasks
- π¬ Dual Platform Automation: Generate both OpenTrons Python and Lynx C# scripts
- π PDF Analysis: Upload and analyze scientific papers with AI
- π§ͺ ChEMBL Integration: Access to 27 specialized chemistry tools and databases
- π‘οΈ Safety-First Design: Comprehensive safety analysis and hazard assessment
- π¨ Beautiful UI: Modern, responsive interface with dark/light themes
- β‘ Real-time Processing: Fast, intelligent responses with context awareness
- Chemical Properties: Detailed analysis of molecular structures, properties, and behaviors
- Database Integration: Access to ChEMBL, PubChem, and other chemical databases
- Literature Search: Find and analyze relevant research papers and studies
- Compound Analysis: Comprehensive compound information and structure analysis
- Target Research: Biological target analysis and pathway information
- Step-by-Step Protocols: Generate detailed, reproducible laboratory procedures
- Safety Integration: Built-in safety considerations and hazard warnings
- Material Lists: Automatic generation of required materials and equipment
- Method Optimization: Suggestions for improving experimental efficiency
- Documentation: Professional protocol formatting with clear instructions
- Multi-Platform Support: Generate code for both OpenTrons OT2 and Dynamic Device Lynx
- Platform Selection: Interactive platform choice for automation scripts
- Python Scripts: OpenTrons OT2 automation with full API integration
- C# Scripts: Dynamic Device Lynx automation with comprehensive liquid handling
- Code Validation: Built-in validation and error checking for generated scripts
- Hazard Assessment: Comprehensive safety analysis of chemicals and procedures
- Risk Evaluation: Detailed risk assessment with mitigation strategies
- Safety Protocols: Generate safety procedures and emergency protocols
- Chemical Safety: MSDS integration and safety data analysis
- Compliance: Ensure adherence to safety standards and regulations
- Document Upload: Drag-and-drop PDF upload with progress tracking
- AI Analysis: OpenAI GPT-4o powered document analysis and summarization
- Context Integration: PDF content automatically integrated into chat responses
- Multi-Format Support: Support for scientific papers, protocols, and reports
- Smart Extraction: Intelligent extraction of key information and methodologies
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Router Agent ββββββ Pipeline Managerββββββ Specialized β
β (Query Router) β β (Orchestrator)β β Agents β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β β β
βββββββββΌββββ βββββββΌββββββ βββββΌββββββ
β Research β β Protocol β βAutomate β
β Agent β β Agent β β Agent β
βββββββββββββ βββββββββββββ βββββββββββ
β β β
βββββββββΌββββ βββββββΌββββββ
β Safety β β MCP β
β Agent β β Tools β
βββββββββββββ βββββββββββββ
- Backend: Python 3.12+, Flask 3.1+, LangChain, LangGraph
- AI Integration: OpenAI GPT-4o, ChEMBL MCP Server
- Frontend: Pure HTML/CSS/JavaScript (no build process required)
- Automation: OpenTrons API, Dynamic Device Lynx C# integration
- PDF Processing: PyMuPDF, OpenAI Vision API
- Database: ChEMBL, PubChem integration via MCP
- Python 3.12 or higher
- Node.js (for ChEMBL MCP Server)
- OpenAI API key
- Git
UV is a fast Python package manager that's significantly faster than pip:
# Install UV (if not already installed)
# On Windows:
pip install uv
# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup with UV
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze
# Install dependencies with UV (creates virtual environment automatically)
uv sync
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate
# Set up environment variables
echo "OPENAI_API_KEY=your-api-key-here" > .env
# Start the application
uv run python app/flask_app.py# Clone and navigate
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .
# Set up environment variables
echo "OPENAI_API_KEY=your-api-key-here" > .env
# Start the application
python app/flask_app.py# Clone the repository
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze
# Run the startup script
./start_catalyze.sh# Initialize submodules (first time only)
./manage_submodules.sh init
# Install Node.js dependencies
./manage_submodules.sh install
# Start with full MCP integration
source .venv/bin/activate && python app/flask_app.py- Local: http://localhost:5003
- Network: http://[your-ip]:5003
mit-catalyze/
βββ π app/
β βββ flask_app.py # Main Flask backend application
βββ π react-build/
β βββ index.html # Beautiful frontend (single file)
β βββ static/ # CSS, JS, and assets
βββ π mcp_servers/
β βββ chembl-mcp-server/ # ChEMBL MCP Server (submodule)
β βββ opentrons-mcp-server/ # OpenTrons MCP Server (submodule)
βββ π src/
β βββ π agents/ # AI Agent implementations
β β βββ base_agent.py # Base class for all agents
β β βββ router_agent.py # Query routing and classification
β β βββ research_agent.py # Chemistry research and analysis
β β βββ protocol_agent.py # Lab protocol generation
β β βββ automate_agent.py # Lab automation scripts
β β βββ safety_agent.py # Safety analysis and hazards
β βββ π api/
β β βββ chat_endpoints.py # REST API endpoints
β βββ π clients/
β β βββ llm_client.py # OpenAI integration
β β βββ pubchem_client.py # PubChem API client
β β βββ opentrons_validator.py # OpenTrons code validation
β βββ π generators/
β β βββ protocol_generator.py # Protocol generation logic
β β βββ automation_generator.py # Automation script generation
β β βββ lynx_generator.py # Lynx C# script generation
β βββ π pipeline/
β β βββ pipeline_manager.py # Main processing pipeline
β β βββ mode_processor.py # Mode-specific processing
β βββ π config/
β β βββ config.py # Application configuration
β β βββ logging_config.py # Logging configuration
β βββ π prompts/
β βββ research_agent.txt # Research agent prompts
β βββ protocol_agent.txt # Protocol agent prompts
β βββ automate_agent.txt # Automation agent prompts
β βββ safety_agent.txt # Safety agent prompts
βββ π docs/
β βββ AGENT_ARCHITECTURE_SUMMARY.md
β βββ PDF_UPLOAD_FEATURE.md
β βββ MCP_CONFIGURATION.md
β βββ SETUP.md
βββ π tests/
β βββ [comprehensive test suite]
βββ pyproject.toml # Python dependencies
βββ start_catalyze.sh # Quick start script
βββ manage_submodules.sh # Submodule management
βββ README.md # This file
User: "What are the properties of caffeine?"
Agent: [Provides detailed molecular analysis, pharmacological effects,
safety data, and research findings from ChEMBL database]
User: "Generate a protocol for synthesizing aspirin"
Agent: [Creates step-by-step procedure with materials list,
safety considerations, and detailed instructions]
User: "Generate code for serial dilution"
Agent: [Asks for platform selection: OpenTrons or Lynx]
User: "Lynx"
Agent: [Generates complete C# script for Dynamic Device Lynx system]
User: "What are the safety hazards of working with sulfuric acid?"
Agent: [Provides comprehensive safety analysis, hazard warnings,
protective equipment requirements, and emergency procedures]
User: [Uploads research paper PDF]
User: "What are the key findings in this paper?"
Agent: [Analyzes PDF content and provides detailed summary
of findings, methodology, and implications]
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional
DEBUG=true
LOG_LEVEL=INFO
MCP_SERVER_URL=http://localhost:8000# config.py
MCP_SERVERS = {
"chembl": {
"transport": "streamable_http",
"url": "http://localhost:8000/mcp"
},
"opentrons": {
"transport": "stdio",
"command": "node",
"args": ["./mcp_servers/opentrons-mcp-server/dist/index.js"]
}
}Catalyze integrates with the ChEMBL MCP Server to provide access to 27 specialized chemistry tools:
- Compound Search: Search by name, synonym, or identifier
- Target Analysis: Biological target information and pathways
- Bioactivity Data: Assay results and activity measurements
- Drug Development: Approved drugs and clinical candidates
- Chemical Properties: ADMET properties and drug-likeness
- Structure Search: Similarity and substructure searches
- Dose Response: Pharmacological data analysis
- Hybrid AI: Combines OpenAI knowledge with ChEMBL database accuracy
- Real-time Data: Access to up-to-date chemical information
- Comprehensive Coverage: 2+ million compounds and 1+ million assays
- Professional Grade: Industry-standard chemical database integration
# Generated Python script example
from opentrons import protocol_api
metadata = {
'protocolName': 'Serial Dilution Protocol',
'author': 'Catalyze AI',
'apiLevel': '2.13'
}
def run(protocol: protocol_api.ProtocolContext):
# Complete automation script
pass// Generated C# script example
using System;
using MethodManager.Core;
using MMScriptObjects;
public class SerialDilutionProtocol : IMMScriptExecutor
{
public void Execute(IMMApp app)
{
// Complete Lynx automation script
}
}POST /api/chat
Content-Type: application/json
{
"message": "What are the properties of caffeine?",
"mode": "research",
"conversation_history": [],
"pdf_context": null
}POST /api/upload-pdf
Content-Type: multipart/form-data
pdf: [PDF file]GET /api/agentsGET /api/health# Run all tests
python -m pytest tests/
# Run specific test categories
python -m pytest tests/test_agents.py
python -m pytest tests/test_automation.py
python -m pytest tests/test_pdf_upload.py- Agent System: All 5 agents tested
- API Endpoints: Complete endpoint testing
- PDF Processing: Upload and analysis testing
- Automation: Both OpenTrons and Lynx code generation
- MCP Integration: ChEMBL server connectivity
- Response Time: < 2 seconds for most queries
- Concurrent Users: Supports multiple simultaneous users
- Memory Usage: Optimized for efficient resource utilization
- Database Queries: Cached responses for common queries
- Agent-Based Architecture: Easy to scale individual components
- Async Processing: Non-blocking operations for better performance
- Caching: Intelligent caching for improved response times
- Modular Design: Easy to add new agents and features
- Local Processing: All data processed locally when possible
- Secure API Keys: Environment variable protection
- Temporary Files: Automatic cleanup of uploaded files
- No Data Storage: No persistent storage of user data
- Input Validation: Comprehensive input sanitization
- Error Handling: Graceful error recovery
- Rate Limiting: Protection against abuse
- Secure Headers: CORS and security headers
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
# Clone and setup
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze
# Install with UV (includes dev dependencies)
uv sync --dev
# Activate virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Run in development mode
uv run python app/flask_app.py --debug# Clone and setup
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze
# Create development environment
python -m venv .venv
source .venv/bin/activate
# Install development dependencies
pip install -e ".[dev]"
# Run in development mode
python app/flask_app.py --debug- Python: Follow PEP 8 guidelines
- JavaScript: Use modern ES6+ syntax
- Documentation: Comprehensive docstrings and comments
- Testing: Maintain high test coverage
- Multi-PDF Support: Analyze multiple documents simultaneously
- Advanced Visualization: Interactive molecular structure viewers
- Collaboration: Real-time collaborative protocol editing
- Mobile App: Native mobile application
- API Expansion: Public API for third-party integrations
- Database Integration: Persistent storage for user data
- Caching Layer: Redis-based caching for improved performance
- Microservices: Containerized microservices architecture
- CI/CD: Automated testing and deployment pipeline
We welcome contributions from the chemistry and AI communities! Here's how you can help:
- Bug Reports: Report issues and bugs
- Feature Requests: Suggest new features
- Code Contributions: Submit pull requests
- Documentation: Improve documentation
- Testing: Help test new features
- Check the Issues page
This project is licensed under the MIT License - see the LICENSE file for details.
- MIT LLM Hackathon for Applications in Materials and Chemistry 2025 for the inspiration and platform
- ChEMBL for the comprehensive chemical database