🧪 Catalyze - AI-Powered Chemistry Assistant

A sophisticated AI-powered chemistry assistant for material science and chemistry research, featuring specialized agents, multi-platform automation, and comprehensive chemical knowledge integration.

Winning Announcement Here

🌟 What is Catalyze?

Catalyze is a cutting-edge AI chemistry assistant that combines the power of specialized AI agents with comprehensive chemical databases to provide intelligent, context-aware assistance for chemistry research, protocol generation, lab automation, and safety analysis. Built with a sophisticated agent-based architecture, Catalyze offers multi-platform automation support, PDF analysis capabilities, and seamless integration with ChEMBL's extensive chemical database.

🎯 Key Highlights

🤖 Multi-Agent Architecture: 5 specialized AI agents for different chemistry tasks
🔬 Dual Platform Automation: Generate both OpenTrons Python and Lynx C# scripts
📄 PDF Analysis: Upload and analyze scientific papers with AI
🧪 ChEMBL Integration: Access to 27 specialized chemistry tools and databases
🛡️ Safety-First Design: Comprehensive safety analysis and hazard assessment
🎨 Beautiful UI: Modern, responsive interface with dark/light themes
⚡ Real-time Processing: Fast, intelligent responses with context awareness

✨ Core Features

🔬 Research Agent

Chemical Properties: Detailed analysis of molecular structures, properties, and behaviors
Database Integration: Access to ChEMBL, PubChem, and other chemical databases
Literature Search: Find and analyze relevant research papers and studies
Compound Analysis: Comprehensive compound information and structure analysis
Target Research: Biological target analysis and pathway information

📋 Protocol Agent

Step-by-Step Protocols: Generate detailed, reproducible laboratory procedures
Safety Integration: Built-in safety considerations and hazard warnings
Material Lists: Automatic generation of required materials and equipment
Method Optimization: Suggestions for improving experimental efficiency
Documentation: Professional protocol formatting with clear instructions

🤖 Automation Agent

Multi-Platform Support: Generate code for both OpenTrons OT2 and Dynamic Device Lynx
Platform Selection: Interactive platform choice for automation scripts
Python Scripts: OpenTrons OT2 automation with full API integration
C# Scripts: Dynamic Device Lynx automation with comprehensive liquid handling
Code Validation: Built-in validation and error checking for generated scripts

🛡️ Safety Agent

Hazard Assessment: Comprehensive safety analysis of chemicals and procedures
Risk Evaluation: Detailed risk assessment with mitigation strategies
Safety Protocols: Generate safety procedures and emergency protocols
Chemical Safety: MSDS integration and safety data analysis
Compliance: Ensure adherence to safety standards and regulations

📄 PDF Analysis

Document Upload: Drag-and-drop PDF upload with progress tracking
AI Analysis: OpenAI GPT-4o powered document analysis and summarization
Context Integration: PDF content automatically integrated into chat responses
Multi-Format Support: Support for scientific papers, protocols, and reports
Smart Extraction: Intelligent extraction of key information and methodologies

🏗️ Architecture Overview

Agent-Based System

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Router Agent  │────│ Pipeline Manager│────│  Specialized    │
│  (Query Router) │    │   (Orchestrator)│    │     Agents      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │
                    ┌───────────┼───────────┐
                    │           │           │
            ┌───────▼───┐ ┌─────▼─────┐ ┌───▼─────┐
            │ Research  │ │ Protocol  │ │Automate │
            │   Agent   │ │   Agent   │ │  Agent  │
            └───────────┘ └───────────┘ └─────────┘
                    │           │           │
            ┌───────▼───┐ ┌─────▼─────┐
            │  Safety   │ │   MCP     │
            │   Agent   │ │  Tools    │
            └───────────┘ └───────────┘

Technology Stack

Backend: Python 3.12+, Flask 3.1+, LangChain, LangGraph
AI Integration: OpenAI GPT-4o, ChEMBL MCP Server
Frontend: Pure HTML/CSS/JavaScript (no build process required)
Automation: OpenTrons API, Dynamic Device Lynx C# integration
PDF Processing: PyMuPDF, OpenAI Vision API
Database: ChEMBL, PubChem integration via MCP

🚀 Quick Start Guide

Prerequisites

Python 3.12 or higher
Node.js (for ChEMBL MCP Server)
OpenAI API key
Git

Installation Methods

Method 1: UV (Recommended)

UV is a fast Python package manager that's significantly faster than pip:

# Install UV (if not already installed)
# On Windows:
pip install uv

# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup with UV
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze

# Install dependencies with UV (creates virtual environment automatically)
uv sync

# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

# Set up environment variables
echo "OPENAI_API_KEY=your-api-key-here" > .env

# Start the application
uv run python app/flask_app.py

Method 2: Traditional pip

# Clone and navigate
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Set up environment variables
echo "OPENAI_API_KEY=your-api-key-here" > .env

# Start the application
python app/flask_app.py

Option 1: Simple Start (Recommended)

# Clone the repository
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze

# Run the startup script
./start_catalyze.sh

Option 3: Advanced Setup with ChEMBL

# Initialize submodules (first time only)
./manage_submodules.sh init

# Install Node.js dependencies
./manage_submodules.sh install

# Start with full MCP integration
source .venv/bin/activate && python app/flask_app.py

Access the Application

Local: http://localhost:5003
Network: http://[your-ip]:5003

📁 Project Structure

mit-catalyze/
├── 📁 app/
│   └── flask_app.py              # Main Flask backend application
├── 📁 react-build/
│   ├── index.html                # Beautiful frontend (single file)
│   └── static/                   # CSS, JS, and assets
├── 📁 mcp_servers/
│   ├── chembl-mcp-server/        # ChEMBL MCP Server (submodule)
│   └── opentrons-mcp-server/     # OpenTrons MCP Server (submodule)
├── 📁 src/
│   ├── 📁 agents/                # AI Agent implementations
│   │   ├── base_agent.py         # Base class for all agents
│   │   ├── router_agent.py       # Query routing and classification
│   │   ├── research_agent.py     # Chemistry research and analysis
│   │   ├── protocol_agent.py     # Lab protocol generation
│   │   ├── automate_agent.py     # Lab automation scripts
│   │   └── safety_agent.py       # Safety analysis and hazards
│   ├── 📁 api/
│   │   └── chat_endpoints.py     # REST API endpoints
│   ├── 📁 clients/
│   │   ├── llm_client.py         # OpenAI integration
│   │   ├── pubchem_client.py     # PubChem API client
│   │   └── opentrons_validator.py # OpenTrons code validation
│   ├── 📁 generators/
│   │   ├── protocol_generator.py # Protocol generation logic
│   │   ├── automation_generator.py # Automation script generation
│   │   └── lynx_generator.py     # Lynx C# script generation
│   ├── 📁 pipeline/
│   │   ├── pipeline_manager.py   # Main processing pipeline
│   │   └── mode_processor.py     # Mode-specific processing
│   ├── 📁 config/
│   │   ├── config.py             # Application configuration
│   │   └── logging_config.py     # Logging configuration
│   └── 📁 prompts/
│       ├── research_agent.txt    # Research agent prompts
│       ├── protocol_agent.txt    # Protocol agent prompts
│       ├── automate_agent.txt    # Automation agent prompts
│       └── safety_agent.txt      # Safety agent prompts
├── 📁 docs/
│   ├── AGENT_ARCHITECTURE_SUMMARY.md
│   ├── PDF_UPLOAD_FEATURE.md
│   ├── MCP_CONFIGURATION.md
│   └── SETUP.md
├── 📁 tests/
│   └── [comprehensive test suite]
├── pyproject.toml                # Python dependencies
├── start_catalyze.sh             # Quick start script
├── manage_submodules.sh          # Submodule management
└── README.md                     # This file

🧪 Usage Examples

Research Questions

User: "What are the properties of caffeine?"
Agent: [Provides detailed molecular analysis, pharmacological effects, 
        safety data, and research findings from ChEMBL database]

Protocol Generation

User: "Generate a protocol for synthesizing aspirin"
Agent: [Creates step-by-step procedure with materials list, 
        safety considerations, and detailed instructions]

Lab Automation

User: "Generate code for serial dilution"
Agent: [Asks for platform selection: OpenTrons or Lynx]
User: "Lynx"
Agent: [Generates complete C# script for Dynamic Device Lynx system]

Safety Analysis

User: "What are the safety hazards of working with sulfuric acid?"
Agent: [Provides comprehensive safety analysis, hazard warnings, 
        protective equipment requirements, and emergency procedures]

PDF Analysis

User: [Uploads research paper PDF]
User: "What are the key findings in this paper?"
Agent: [Analyzes PDF content and provides detailed summary 
        of findings, methodology, and implications]

🔧 Configuration

Environment Variables

# Required
OPENAI_API_KEY=your-openai-api-key

# Optional
DEBUG=true
LOG_LEVEL=INFO
MCP_SERVER_URL=http://localhost:8000

MCP Server Configuration

# config.py
MCP_SERVERS = {
    "chembl": {
        "transport": "streamable_http",
        "url": "http://localhost:8000/mcp"
    },
    "opentrons": {
        "transport": "stdio",
        "command": "node",
        "args": ["./mcp_servers/opentrons-mcp-server/dist/index.js"]
    }
}

🧪 ChEMBL Integration

Catalyze integrates with the ChEMBL MCP Server to provide access to 27 specialized chemistry tools:

Available Tools

Compound Search: Search by name, synonym, or identifier
Target Analysis: Biological target information and pathways
Bioactivity Data: Assay results and activity measurements
Drug Development: Approved drugs and clinical candidates
Chemical Properties: ADMET properties and drug-likeness
Structure Search: Similarity and substructure searches
Dose Response: Pharmacological data analysis

Integration Benefits

Hybrid AI: Combines OpenAI knowledge with ChEMBL database accuracy
Real-time Data: Access to up-to-date chemical information
Comprehensive Coverage: 2+ million compounds and 1+ million assays
Professional Grade: Industry-standard chemical database integration

🤖 Automation Platforms

OpenTrons OT2 (Python)

# Generated Python script example
from opentrons import protocol_api

metadata = {
    'protocolName': 'Serial Dilution Protocol',
    'author': 'Catalyze AI',
    'apiLevel': '2.13'
}

def run(protocol: protocol_api.ProtocolContext):
    # Complete automation script
    pass

Dynamic Device Lynx (C#)

// Generated C# script example
using System;
using MethodManager.Core;
using MMScriptObjects;

public class SerialDilutionProtocol : IMMScriptExecutor
{
    public void Execute(IMMApp app)
    {
        // Complete Lynx automation script
    }
}

📊 API Endpoints

Chat Processing

POST /api/chat
Content-Type: application/json

{
  "message": "What are the properties of caffeine?",
  "mode": "research",
  "conversation_history": [],
  "pdf_context": null
}

PDF Upload

POST /api/upload-pdf
Content-Type: multipart/form-data

pdf: [PDF file]

Agent Information

GET /api/agents

Health Check

GET /api/health

🧪 Testing

Run Test Suite

# Run all tests
python -m pytest tests/

# Run specific test categories
python -m pytest tests/test_agents.py
python -m pytest tests/test_automation.py
python -m pytest tests/test_pdf_upload.py

Test Coverage

Agent System: All 5 agents tested
API Endpoints: Complete endpoint testing
PDF Processing: Upload and analysis testing
Automation: Both OpenTrons and Lynx code generation
MCP Integration: ChEMBL server connectivity

🚀 Performance & Scalability

Performance Metrics

Response Time: < 2 seconds for most queries
Concurrent Users: Supports multiple simultaneous users
Memory Usage: Optimized for efficient resource utilization
Database Queries: Cached responses for common queries

Scalability Features

Agent-Based Architecture: Easy to scale individual components
Async Processing: Non-blocking operations for better performance
Caching: Intelligent caching for improved response times
Modular Design: Easy to add new agents and features

🔒 Security & Privacy

Data Protection

Local Processing: All data processed locally when possible
Secure API Keys: Environment variable protection
Temporary Files: Automatic cleanup of uploaded files
No Data Storage: No persistent storage of user data

Safety Features

Input Validation: Comprehensive input sanitization
Error Handling: Graceful error recovery
Rate Limiting: Protection against abuse
Secure Headers: CORS and security headers

🛠️ Development

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Development Setup

With UV (Recommended)

# Clone and setup
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze

# Install with UV (includes dev dependencies)
uv sync --dev

# Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Run in development mode
uv run python app/flask_app.py --debug

With pip

# Clone and setup
git clone https://github.com/your-username/mit-catalyze.git
cd mit-catalyze

# Create development environment
python -m venv .venv
source .venv/bin/activate

# Install development dependencies
pip install -e ".[dev]"

# Run in development mode
python app/flask_app.py --debug

Code Style

Python: Follow PEP 8 guidelines
JavaScript: Use modern ES6+ syntax
Documentation: Comprehensive docstrings and comments
Testing: Maintain high test coverage

📈 Roadmap

Upcoming Features

Multi-PDF Support: Analyze multiple documents simultaneously
Advanced Visualization: Interactive molecular structure viewers
Collaboration: Real-time collaborative protocol editing
Mobile App: Native mobile application
API Expansion: Public API for third-party integrations

Technical Improvements

Database Integration: Persistent storage for user data
Caching Layer: Redis-based caching for improved performance
Microservices: Containerized microservices architecture
CI/CD: Automated testing and deployment pipeline

🤝 Contributing

We welcome contributions from the chemistry and AI communities! Here's how you can help:

Ways to Contribute

Bug Reports: Report issues and bugs
Feature Requests: Suggest new features
Code Contributions: Submit pull requests
Documentation: Improve documentation
Testing: Help test new features

Getting Started

Check the Issues page

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

MIT LLM Hackathon for Applications in Materials and Chemistry 2025 for the inspiration and platform
ChEMBL for the comprehensive chemical database

Built with ❤️ for the Chemistry Community

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
app		app
docs		docs
mcp_servers		mcp_servers
react-build		react-build
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
manage_submodules.sh		manage_submodules.sh
pdf_reading_alternatives.py		pdf_reading_alternatives.py
pyproject.toml		pyproject.toml
start_catalyze.sh		start_catalyze.sh
test_sample.pdf		test_sample.pdf
test_upload.pdf		test_upload.pdf
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🧪 Catalyze - AI-Powered Chemistry Assistant

🌟 What is Catalyze?

🎯 Key Highlights

✨ Core Features

🔬 Research Agent

📋 Protocol Agent

🤖 Automation Agent

🛡️ Safety Agent

📄 PDF Analysis

🏗️ Architecture Overview

Agent-Based System

Technology Stack

🚀 Quick Start Guide

Prerequisites

Installation Methods

Method 1: UV (Recommended)

Method 2: Traditional pip

Option 1: Simple Start (Recommended)

Option 3: Advanced Setup with ChEMBL

Access the Application

📁 Project Structure

🧪 Usage Examples

Research Questions

Protocol Generation

Lab Automation

Safety Analysis

PDF Analysis

🔧 Configuration

Environment Variables

MCP Server Configuration

🧪 ChEMBL Integration

Available Tools

Integration Benefits

🤖 Automation Platforms

OpenTrons OT2 (Python)

Dynamic Device Lynx (C#)

📊 API Endpoints

Chat Processing

PDF Upload

Agent Information

Health Check

🧪 Testing

Run Test Suite

Test Coverage

🚀 Performance & Scalability

Performance Metrics

Scalability Features

🔒 Security & Privacy

Data Protection

Safety Features

🛠️ Development

Contributing

Development Setup

With UV (Recommended)

With pip

Code Style

📈 Roadmap

Upcoming Features

Technical Improvements

🤝 Contributing

Ways to Contribute

Getting Started

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages