Secure LLM Gateway

A FastAPI gateway that sits in front of LLM providers and enforces policy, safety, and observability. It implements:

Auth via JWT (multi-tenant, “dev” token for local use)
Policy enforcement (Open Policy Agent (OPA) or local policy)
Context firewall for retrieval-augmented generation (RAG) sources
Response validator with PII/secret redaction
Rate limiting (in-memory or Redis)
Telemetry (OpenTelemetry)
DevSecOps guardrails (pre-commit: Black, Ruff, MyPy, Bandit, detect-secrets)
Docker / Docker Compose
CI with smoke tests and type/lint/security checks

The goal is to provide a secure, testable reference for gating LLM usage in production.

Quick start

Prerequisites

Python 3.11+
(Optional) Docker & Docker Compose
PowerShell (for the smoke script on Windows/GitHub Actions runner with pwsh)

Run locally (without Docker)

python -m venv .venv
. .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Start the API
uvicorn api.main:app --reload --port 8000

Health checks:

curl http://127.0.0.1:8000/healthz
curl http://127.0.0.1:8000/readyz

Call completions (uses the “stub” model in tests/dev):

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Authorization: Bearer dev-token" \
  -H "Content-Type: application/json" \
  -d '{"model":"stub","messages":[{"role":"user","content":"hello"}]}'

Run with Docker Compose

docker compose up --build
# API defaults to http://127.0.0.1:8000

There’s also a docker-compose.prod.yml example for a more production-like stack.

Architecture & flow

High-level request pipeline:

Auth: Authorization: Bearer <token> → tenant extracted/validated
Request validation: size limits, allowed model list, message count/cap
Context firewall (RAG): validates context.source and scans chunks for prompt-injection/high-risk cues
Policy check: OPA (if configured) or local policy → may deny models/egress
Provider call: current code includes a stub + OpenAI provider scaffolding
Response validation: redacts PII and secret tokens
Rate limit: in-memory or Redis
Telemetry: request/response spans

Key modules:

api/main.py — FastAPI app & endpoints
api/auth/token.py — JWT / dev-token handling
api/policy/local_policy.py and api/policy/opa_client.py
api/firewall/context_firewall.py — source allowlist & risk scoring
api/firewall/response_validator.py — redaction rules
api/middleware/rate_limit.py — RL via memory/Redis
api/providers/openai_provider.py — provider adapter (stub/openai)
api/telemetry/otel_setup.py — OpenTelemetry wiring
scripts/run-smoke.ps1 & scripts/make_jwt.py — CI/local smoke harness

API

`GET /healthz`

Simple liveness probe.

`GET /readyz`

Readiness probe showing policy mode:

{ "mode": "OPA" | "LOCAL", "ready": true }

`POST /v1/chat/completions`

Request body:

{
  "model": "stub",
  "messages": [
    { "role": "user", "content": "hello" }
  ],
  "max_tokens": 512,
  "context": {
    "source": "kb://approved/file.md",
    "chunks": [{ "id": "1", "content": "..." }]
  }
}

Response:

{
  "answer": "string",
  "citations": ["..."],
  "meta": { "provider": "stub" }
}

Auth:

Local/dev: Authorization: Bearer dev-token
Trusted tenants: JWT signed with JWT_SECRET (see scripts/make_jwt.py)

Configuration

Environment variables (see api/config.py):

JWT_SECRET — required for real JWTs (dev token still works without)
ALLOWED_MODELS — comma-separated allowlist (e.g. stub,openai:gpt-4o)
ALLOWED_CONTEXT_ORIGINS — allowlist prefixes for RAG sources, e.g. kb://approved/
CONTEXT_FIREWALL_RISK_THRESHOLD — integer threshold (higher → stricter)
OPA_URL — if set, gateway asks OPA for deny decisions
REDIS_URL — if set, enables Redis rate limiting (redis://host:6379/0, etc.)
Rate-limit defaults are defined in middleware (limit=5/window=1s for anon) and can be adjusted if needed.

You can put values in a .env for local runs.

Local development

Install & run

pip install -r requirements.txt
uvicorn api.main:app --reload

Pre-commit hooks

pre-commit install
pre-commit run --all-files

Hooks include:

black (format)
ruff (lint + format)
mypy (type check) with pinned stub deps
bandit (security lints)
detect-secrets (with .secrets.baseline)

Tests

pytest -q

You should see all tests pass once your environment is set up.

Testing & quality

Unit/functional tests in tests/
Smoke test (scripts/run-smoke.ps1) used both locally and in CI:
- Health & readiness
- Policy deny/allow
- Size/message caps
- Context firewall handling
- Response redaction
- (Optional) rate limit

Running the smoke suite

From a local PowerShell (or GitHub Actions pwsh):

./scripts/run-smoke.ps1

It will:

Create a trusted JWT using scripts/make_jwt.py (signed by JWT_SECRET)
Exercise the API and summarize PASS/FAIL checks

If you’re on Linux/Mac and don’t want to use PowerShell, you can replicate requests with curl (the script is just convenience).

Policy (OPA vs. local)

Local policy: api/policy/local_policy.py — fast, easy to extend in Python for dev/test.
OPA: set OPA_URL, run an OPA sidecar, and define Rego policies. The app calls POST /v1/data/gateway/deny with {tenant, model}.

This allows you to ship the same app to prod and swap policies centrally without redeploying.

Context firewall

When context is provided:

Origin allowlist via ALLOWED_CONTEXT_ORIGINS (e.g., kb://approved/)
Risk scoring on chunk text for prompt-injection cues (e.g., ignore previous instructions, reveal your prompt, etc.)
Requests with disallowed origins or high risk are rejected with HTTP 400.

Types:

ContextInput, SanitizedContext in api/firewall/context_firewall.py

Response validation & redaction

The validator (api/firewall/response_validator.py) scans generated responses and redacts:

Emails & phone numbers (PII)
Common API key patterns (e.g., sk_...)
Bearer tokens
AWS access key formats, etc.

If validation fails, the gateway returns a sanitized response to the client.

Rate limiting

Middleware: api/middleware/rate_limit.py

In-memory fallback (good for tests/dev)
Redis if REDIS_URL is set:
- Per-tenant / per-IP keys
- Sliding window per simple counters (implementation stays intentionally minimal)

You can adjust limits/windows in the middleware to meet your needs.

Telemetry

api/telemetry/otel_setup.py wires OpenTelemetry so you can export traces/metrics to your preferred backend (e.g., OTLP). Configure via standard OTEL env vars in your deployment.

Security notes

Never commit real secrets. This repo uses detect-secrets with a .secrets.baseline. Update it when changing files:
```
python -m detect_secrets scan > .secrets.baseline
git add .secrets.baseline
```
bandit runs in pre-commit to catch common Python security issues.
JWT validation is strict in non-dev mode. For local dev, dev-token is allowed.

Production deployment

Build the image:

docker build -t secure-llm-gateway:latest .

Compose (see docker-compose.prod.yml) or deploy to your platform (Kubernetes, ECS, etc.)
Provide:
- JWT_SECRET
- OPA_URL (optional, recommended for prod)
- REDIS_URL (recommended)
- OTEL env for telemetry (optional)

Expose 8000 behind your API gateway or ingress, and wire TLS at the edge.

Troubleshooting

422 Unprocessable Entity on /v1/chat/completions:
- Ensure Content-Type: application/json
- Endpoint expects JSON body (the app also supports {"req": {...}} wrapper).
Rate-limit failures in tests:
- The in-memory limiter is tight by default; tests already avoid flakiness, but if you parallelize you may want Redis.
OPA denies everything:
- Check your Rego policy; run OPA with logs; verify payload {tenant, model}.

Roadmap / Ideas

Provider plugins (Azure OpenAI, Anthropic, Vertex, etc.)
Per-tenant policy bundles
Egress allowlist & audit for tools/functions
Structured redaction reports for compliance
Richer context firewall (LLM-based heuristics, embeddings)
Async batching & caching layer

License

MIT (or your preferred license). See LICENSE file if present.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
api		api
policies		policies
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
Dockerfile		Dockerfile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
req.jsoncurl.exe		req.jsoncurl.exe
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Secure LLM Gateway

Contents

Quick start

Prerequisites

Run locally (without Docker)

Run with Docker Compose

Architecture & flow

API

`GET /healthz`

`GET /readyz`

`POST /v1/chat/completions`

Configuration

Local development

Install & run

Pre-commit hooks

Tests

Testing & quality

Running the smoke suite

Policy (OPA vs. local)

Context firewall

Response validation & redaction

Rate limiting

Telemetry

Security notes

Production deployment

Troubleshooting

Roadmap / Ideas

License

About

Uh oh!

Releases

Packages

Languages

tsyrulb/secure-llm-gateway

Folders and files

Latest commit

History

Repository files navigation

Secure LLM Gateway

Contents

Quick start

Prerequisites

Run locally (without Docker)

Run with Docker Compose

Architecture & flow

API

GET /healthz

GET /readyz

POST /v1/chat/completions

Configuration

Local development

Install & run

Pre-commit hooks

Tests

Testing & quality

Running the smoke suite

Policy (OPA vs. local)

Context firewall

Response validation & redaction

Rate limiting

Telemetry

Security notes

Production deployment

Troubleshooting

Roadmap / Ideas

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /healthz`

`GET /readyz`

`POST /v1/chat/completions`

Packages