Skip to content

Latest commit

 

History

History
138 lines (106 loc) · 7.44 KB

File metadata and controls

138 lines (106 loc) · 7.44 KB

Project Overview

OpenGradient TEE-gateway is an LLM routing service designed to run within AWS Nitro Enclave TEE (Trusted Execution Environment). It provides a secure, cryptographically verifiable interface to multiple LLM providers (OpenAI, Anthropic, Google Gemini, xAI Grok) with remote attestation, response signing, and x402v2 micropayment access control. The tee-gateway is a part of the decentralized OpenGradient network providing verifiable inference.

The repo must provide a stable AWS Nitro PCR when the code doesn't change in order to allow anyone to reproduce the PCRs locally by building the image as a way to verify what code we are running and also for 3rd party operators to set up their own tee-gateway nodes with the same PCRs in order to participate in the network.

Project Structure highlighting core files

├── tee_gateway/             # Main application package (Flask/connexion)
│   ├── __main__.py          # Entry point: app factory, x402 middleware setup, key injection
│   ├── llm_backend.py       # LLM provider routing via LangChain, HTTP client management
│   ├── tee_manager.py       # TEE key generation, nitriding registration, response signing
│   ├── model_registry.py    # Model config and per-token pricing
│   ├── definitions.py       # On-chain addresses, network IDs, payment amounts
│   ├── facilitator_api.py   # x402 facilitator API client
│   ├── heartbeat/           # Heartbeat/health monitoring
│   ├── controllers/         # Request handlers (chat, completions, security)
│   ├── models/              # OpenAI-compatible Pydantic models
│   ├── openapi/             # openapi.yaml spec
│   └── test/                # Unit tests
├── scripts/
│   ├── start.sh             # Enclave startup script (nitriding + server)
│   ├── run-enclave.sh       # EC2 host launcher (gvproxy, EIF, key injection)
├── pyproject.toml           # Project metadata and dependencies (managed by uv)
├── Dockerfile               # Multi-stage: nitriding builder + python:3.12-slim-bullseye + uv
├── Makefile
└── measurements.txt         # PCR measurements for the deployed enclave image

Common Commands

# Dependency management (uses uv — https://docs.astral.sh/uv/)
uv sync                      # Install/update dependencies from uv.lock
uv add <package>             # Add a new dependency
uv lock                      # Regenerate lockfile after editing pyproject.toml
# IMPORTANT: uv.lock is baked into the Docker image and affects PCR measurements.
# Only regenerate the lockfile when intentionally changing dependencies.

# Run server locally for development (without TEE)
make test-local              # Runs: uv run python -m tee_gateway

# Linting and type checking
make lint                    # Run ruff format + ruff check + mypy
make mypy                    # Run mypy type checker only

# Build enclave image
make image                   # Build Docker image as TAR using Kaniko

# Build EIF and run in Nitro Enclave
make run                     # or: make all

# Clean build artifacts
make clean

# Show all available targets
make help

Environment Variables

API keys (injected at runtime via POST /v1/keys — do NOT bake into the image):

  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • GOOGLE_API_KEY
  • XAI_API_KEY
  • ARK_API_KEY (BytePlus / ByteDance ModelArk; injected as bytedance_api_key)
  • NOUS_API_KEY (Nous Research / Nous Portal; injected as nous_api_key)

Server configuration:

  • API_SERVER_PORT (default: 8000)
  • API_SERVER_HOST (default: 0.0.0.0)
  • EVM_PAYMENT_ADDRESS — wallet address to receive x402 payments
  • FACILITATOR_URL — x402 facilitator endpoint

Architecture

Core Flow

  1. TEEKeyManager (tee_manager.py) generates RSA-2048 key pair on startup and registers the public key hash with the nitriding daemon
  2. Incoming requests pass through x402 payment middleware before reaching handlers
  3. Requests are routed to the appropriate LLM provider via LangChain (llm_backend.py)
  4. All responses are signed with RSA-PSS-SHA256 over keccak256(requestHash || outputHash || timestamp)
  5. Clients verify attestation → get public key → verify signatures

Key Components

  • tee_manager.py: RSA key generation, nitriding registration (/enclave/hash), response signing
  • llm_backend.py: LangChain model instantiation, HTTP client management, provider routing from model name
  • model_registry.py: Maps model names to providers and per-token USD pricing (used by dynamic cost calculator)
  • definitions.py: On-chain constants (addresses, network IDs, payment amounts) — configure here for your deployment
  • util.py: dynamic_session_cost_calculator converts actual token usage to x402 payment amounts

API Endpoints

Endpoint Purpose
/health Health check (status, version, tee_enabled)
/signing-key TEE public key (PEM) and tee_id
/enclave/attestation Nitro attestation document (served by nitriding)
/v1/keys One-time API key injection (POST, loopback-only)
/v1/completions Text completion (signed)
/v1/chat/completions Chat completion with tool support (signed)

TEE Integration

  • Nitriding daemon runs on localhost:8080, provides TLS termination (port 443 externally)
  • Endpoints /enclave/ready and /enclave/hash used for nitriding registration
  • PCR measurements in measurements.txt fingerprint the exact enclave image

Supported Providers

Model name prefixes determine routing:

  • OpenAI: gpt-4.1, gpt-5, gpt-5-mini, gpt-5.2, o4-mini
  • Anthropic: claude-sonnet-4-0/4-5/4-6, claude-haiku-4-5, claude-opus-4-5/4-6/4-7/4-8, claude-fable-5, claude-3-7-sonnet, claude-3-5-haiku
  • Google: gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3-pro-preview, gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview, gemini-3.5-flash; image generation: gemini-2.5-flash-image, gemini-3.1-flash-image
  • xAI: grok-2, grok-3, grok-3-mini, grok-4, grok-4-fast, grok-4-1-fast; image generation: grok-2-image
  • ByteDance (BytePlus ModelArk, OpenAI-compatible, ap-southeast): seed-1.6, seed-1.8, seed-2.0-lite; image generation: seedream-4.0
  • Nous Research (Nous Portal, OpenAI-compatible): hermes-4-405b, hermes-4-70b

Image generation via xAI (grok-2-image) and ByteDance (seedream-4.0) is served through a provider /images/generations endpoint rather than the chat path, but is surfaced on /v1/chat/completions exactly like Gemini's inline-image models (images returned out-of-band under the message images key). These models are billed a flat per-image price (see per_image_price_usd in model_registry.py), not per token.

Verification Examples

  • examples/verify_attestation.py — Validates AWS Nitro attestation documents against the root CA
  • examples/verify_signature_example.py — Demonstrates request hash and RSA-PSS signature verification

Deployment

Multi-stage Docker build: nitriding compiled from source (brave/nitriding-daemon), then copied into python:3.12.10-slim-bullseye. Dependencies are installed via uv sync --frozen from the lockfile for reproducible builds. Enclave launched via scripts/run-enclave.sh with gvproxy as the vsock network bridge, allocating 2 CPUs and 8GB memory.

Port 8000 is forwarded to 127.0.0.1 only on the EC2 host (loopback-only for key injection). Port 443 is forwarded publicly via gvproxy.