Skip to content

TomTonroe/modular-ml-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modular ML Platform

A modular ML framework for creating machine learning dashboards with API serving, experiment tracking, and LLM integration (planned). Built for rapid prototyping and easy adaptation across domains. Intended as a platform for learning new techniques and technologies.

Quick Start

# Clone and setup
git clone https://github.com/TomTonroe/modular-ml-platform
cd modular-ml-platform
conda create -n your_project python=3.12.8 && conda activate your_project

# Install and configure
make install
cp .env.example .env
# Edit .env to set PROJECT_NAME, PROJECT_DATA_PATH, LLM_PROVIDER

# Initialize and run
make db-upgrade
python -m src.train --task your_task --register-model

# Start services
make api &           # FastAPI server (localhost:8000)
make ui &            # Streamlit dashboard (localhost:8501)  
make mlflow-server   # MLflow UI (localhost:5000)

Windows users: See Makefile for equivalent commands

Architecture: The "3 Directory" Pattern

To adapt for your domain, modify only these directories:

src/
├── data/{project}_loader.py      # Your data loading logic
├── features/{project}_features.py # Your feature engineering (optional)
└── models/{project}_models.py    # Your ML tasks and models

Everything else stays the same - the core framework, API, and dashboard automatically work with your domain-specific code. Easy extensions like adding new dashboard panels or API endpoints can be made by following the existing patterns.

Key Features

  • Automatic Model Discovery: MLflow integration with auto-generated API endpoints
  • Modular Dashboard: Add panels by creating files in frontend/streamlit/panels/
  • Multi-source Data: CSV, API, and database loading with Parquet caching
  • Experiment Tracking: MLflow model registry and versioning
  • Generic Core Framework: Easily adaptable to any ML domain

Example: SPARCS Healthcare Analytics

For a working example, see the sparcs-example branch which implements a platform around a dataset of hospital admissions made available by the New York State Department of Health.

git checkout sparcs-example

Adapting to your project

Follow the naming convention: {project}_*.py where {project} matches your PROJECT_NAME in .env

Required Files:

  • src/data/{project}_loader.py - Must have load_{project}_data(data_source, nrows=None) -> pd.DataFrame
  • src/models/{project}_models.py - Must have AVAILABLE_TASKS dict and train_task() function

Optional:

  • src/features/{project}_features.py - Domain-specific transformations

See example files (*_ex.py, example_models.py) for exact interfaces and implementations.

3. Framework Structure

src/
├── core/                    # Framework (don't modify)
├── data/                    # YOUR PROJECT: Data loading
│   ├── *_loader_ex.py       # Examples provided
│   └── {project}_loader.py  # Your loader
├── models/                  # YOUR PROJECT: ML models  
│   ├── example_models.py    # Examples provided
│   └── {project}_models.py  # Your models
├── features/                # YOUR PROJECT: Features (optional)
└── train.py                 # Training script

frontend/streamlit/panels/   # Dashboard components
├── data_explorer/           # Built-in panels
├── model_inference/
├── database_analytics/
└── {project}_dashboard/     # Your custom panels

Next Steps

  1. Try the SPARCS Example: Switch to sparcs-example branch to see a complete implementation
  2. Start with Data: Create src/data/your_domain_loader.py following the example patterns
  3. Add Your Models: Create training tasks in src/models/your_domain_models.py
  4. Build Domain Panels: Add custom analytics in frontend/streamlit/panels/your_dashboard/
  5. Scale Production: Switch to PostgreSQL, remote MLflow, production LLM APIs

Development Commands

# Training
cd src && python -m train --list                    # List available tasks
cd src && python -m train --task your_task          # Train model
cd src && python -m train --task your_task --register-model  # Train + register

# Services  
make api              # Start API server
make ui               # Start dashboard
make mlflow-server    # Start MLflow UI

# Database
make db-upgrade       # Apply migrations
make db-migrate       # Create new migration

# Code Quality
make lint             # Check code quality
make fmt              # Format code
make clean            # Clean artifacts

Windows: Review makefile for commands.

About

Modular ML platform for rapid prototyping: APIs, Dashboards, MLflow tracking, LLM-ready

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages