Spherical

Multi-GPU Inference Service Framework with Worker Pool Management.

Features

Multi-GPU Support: Automatic load balancing across multiple GPUs
Automatic Device Detection: Detects CUDA GPUs if available, falls back to CPU
Worker Pool Management: Configurable workers per device
Async Architecture: Built on asyncio for high throughput
HTTP Server/Client: aiohttp-based server with health checks
Dragon/Asyncflow Integration: Optional HPC runtime support for distributed execution
Metrics Collection: Real-time throughput and device utilization tracking
Extensible: Base classes for adding new model types

Installation

# Basic installation
pip install -e .

# With ESM2 model support
pip install -e ".[esm2]"

# With Dragon/RADICAL support
pip install -e ".[dragon]"

# With development dependencies
pip install -e ".[dev]"

# Full installation
pip install -e ".[esm2,dragon,dev,plotting]"

Quick Start

Running the ESM2 Example

# Start server mode (with HTTP endpoints)
python example/esm2/run_esm2_inference.py --mode server --config_file example/esm2/config.yaml

# Run local inference (no server)
python example/esm2/run_esm2_inference.py --mode local --config_file example/esm2/config.yaml

Configuration

Edit example/esm2/config.yaml to configure:

# Model Settings
model_path: "facebook/esm2_t33_650M_UR50D"

# GPU Configuration
num_services: 1
num_gpus_per_service: 4
num_workers_per_gpu: 2

# Server Settings
server_port: 8000

# Batch Settings
num_batches: 200
max_batch_tokens: 16000

# Execution Settings
debug: true
engine: dragon      # Enable Dragon HPC runtime

Architecture

spherical/
├── src/                       # Core library
│   ├── inference_service.py   # Base inference service + GPU workers
│   ├── server.py              # HTTP server endpoints
│   ├── orchestrator.py        # Multi-node coordination
│   ├── logger.py              # Logging utilities
│   └── utils.py               # Helper functions
├── example/
│   └── esm2/                  # ESM2 example
│       ├── client.py          # HTTP client with load balancing
│       ├── esm2_service.py    # ESM2 service (re-export)
│       ├── run_esm2_inference.py  # Entry point
│       └── config.yaml        # Configuration
├── tests/                     # Unit tests
└── doc/                       # Documentation

Extending for New Models

Create a new service by extending InferenceService:

from src.inference_service import InferenceService

class MyModelService(InferenceService):
    def _load_models(self):
        """Load your model onto GPUs."""
        for device in self.devices:
            self.models[device] = load_model().to(device)

    def process_batch_sync(self, batch_id: int, device: str):
        """Run inference on a batch."""
        model = self.models[device]
        # Process batch...
        self.reply_store[batch_id] = results
        self.processed_queue.put_nowait(batch_id)

    async def generate_batch(self) -> tuple:
        """Generate batches from input queue."""
        seq = await self.input_queue.get()
        if seq is None:
            raise StopAsyncIteration
        batch = tokenize(seq)
        return len(batch), batch

Dragon/Asyncflow Support

For HPC environments, Spherical supports Dragon runtime with asyncflow:

# Enable in config.yaml
engine: dragon
dragon_workers: 100

Run with Dragon:

dragon -w ssh --network-config slurm.yaml run_esm2_infern.py

Metrics & Visualization

Plot inference metrics:

python doc/plot_metrics.py --output_dir outputs

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=src --cov-report=html

# Lint and format code
ruff check .
ruff format .

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitbub		.gitbub
.github		.github
examples/esm2		examples/esm2
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pep8		.pep8
.pylintrc		.pylintrc
.travis.yml		.travis.yml
README.md		README.md
pyproject.toml		pyproject.toml
slurm.yaml		slurm.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spherical

Features

Installation

Quick Start

Running the ESM2 Example

Configuration

Architecture

Extending for New Models

Dragon/Asyncflow Support

Metrics & Visualization

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spherical

Features

Installation

Quick Start

Running the ESM2 Example

Configuration

Architecture

Extending for New Models

Dragon/Asyncflow Support

Metrics & Visualization

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages