Releases · auroraGPT-ANL/inference-gateway

This is the initial public release of the FIRST (Federated Inference Resource Scheduling Toolkit) Inference Gateway.
Key Features:

OpenAI-Compatible API: Provides a familiar interface (/v1/chat/completions) for interacting with large language models on HPC.
Globus Integration: Leverages Globus Auth for secure user authentication/authorization and Globus Compute for orchestrating inference tasks on remote compute resources (HPC clusters, workstations).
Federated & Direct Endpoint Routing: Supports routing requests to specific backend endpoints or automatically selecting from a pool of federated resources.
Flexible Backend Support: Designed to work with various inference servers, with initial support and examples focused on vLLM.
Deployment Options: Includes instructions and configurations for deployment using Docker (recommended) or bare metal setups.
Comprehensive Setup Guide: Detailed README.md covering prerequisites, gateway setup, backend setup (including Globus Compute function registration and endpoint configuration), and verification steps.
Authentication Helper: Provides a CLI script (inference-auth-token.py) to simplify obtaining Globus access tokens for API interaction.
Basic Monitoring: Includes optional Docker Compose setup for Prometheus and Grafana monitoring.
Benchmarking Script: Offers a tool to load-test deployed endpoints.

This release establishes the core functionality for securely exposing LLM inference capabilities from diverse compute resources via a standardized API. It's suitable for teams looking to provide managed access to LLMs running on institutional clusters or powerful local machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: auroraGPT-ANL/inference-gateway

Inference Gateway for FIRST v0.1.0 - Initial Public Release

Uh oh!