Skip to content

Releases: auroraGPT-ANL/inference-gateway

Inference Gateway for FIRST v0.1.0 - Initial Public Release

28 Apr 15:16
557116f

Choose a tag to compare

This is the initial public release of the FIRST (Federated Inference Resource Scheduling Toolkit) Inference Gateway.
Key Features:

  • OpenAI-Compatible API: Provides a familiar interface (/v1/chat/completions) for interacting with large language models on HPC.

  • Globus Integration: Leverages Globus Auth for secure user authentication/authorization and Globus Compute for orchestrating inference tasks on remote compute resources (HPC clusters, workstations).

  • Federated & Direct Endpoint Routing: Supports routing requests to specific backend endpoints or automatically selecting from a pool of federated resources.

  • Flexible Backend Support: Designed to work with various inference servers, with initial support and examples focused on vLLM.

  • Deployment Options: Includes instructions and configurations for deployment using Docker (recommended) or bare metal setups.

  • Comprehensive Setup Guide: Detailed README.md covering prerequisites, gateway setup, backend setup (including Globus Compute function registration and endpoint configuration), and verification steps.

  • Authentication Helper: Provides a CLI script (inference-auth-token.py) to simplify obtaining Globus access tokens for API interaction.

  • Basic Monitoring: Includes optional Docker Compose setup for Prometheus and Grafana monitoring.

  • Benchmarking Script: Offers a tool to load-test deployed endpoints.

This release establishes the core functionality for securely exposing LLM inference capabilities from diverse compute resources via a standardized API. It's suitable for teams looking to provide managed access to LLMs running on institutional clusters or powerful local machines.