A full-stack TypeScript platform for chat, LLM inference observability, and metrics dashboards.
Built as an end-to-end inference logging and ingestion system with:
- Multi-turn chat
- Streaming responses
- Observability SDK
- Event-driven ingestion pipeline
- Real-time metrics dashboards
- Multi-provider LLM support
- Frontend: Next.js 15 + Tailwind CSS
- Backend: Express.js
- Database: PostgreSQL
- Queue/Event Bus: Redis Streams
- Language: TypeScript
- Containerization: Docker + Docker Compose
- SDK:
@ollive/sdk
| Feature | Implementation |
|---|---|
| π¬ Multi-turn chatbot | Express chatbot + Next.js UI with context window |
| β‘ Streaming responses | Server-Sent Events (SSE) |
| π°οΈ Inference SDK | @ollive/sdk trace wrapper |
| π Observability | Logging, latency, tokens, errors |
| π PII redaction | Sensitive preview masking |
| π¦ Event-driven ingestion | Redis Streams consumer groups |
| π Metrics dashboard | Recharts visualizations |
| π Multi-provider support | OpenRouter integration |
| π Conversation management | Resume / cancel chat sessions |
| π³ Docker support | One-command local setup |
Ollive is structured as a TypeScript monorepo with three runtime layers.
| Layer | Stack | Responsibility |
|---|---|---|
| π Web | Next.js 15 | Chat UI + Dashboard |
| π€ Chatbot | Express | LLM orchestration + streaming |
| π₯ Ingestion | Express + Worker | Logging pipeline + metrics |
Shared packages:
@ollive/sdk@ollive/db
βββββββββββββββ REST/SSE ββββββββββββββββ
β Next.js β βββββββββββββββββΊ β Chatbot β
β (web) β β Express β
ββββββββ¬βββββββ ββββββββ¬ββββββββ
β metrics β @ollive/sdk
β βΌ
β ββββββββββββββββ
ββββββββββββββββββββββββΊ β Ingestion β
β Express β
ββββββββ¬ββββββββ
β XADD
βΌ
ββββββββββββββββ
β Redis β
β Streams β
ββββββββ¬ββββββββ
β XREADGROUP
βΌ
ββββββββββββββββ
β Worker β
ββββββββ¬ββββββββ
βΌ
ββββββββββββββββ
β PostgreSQL β
ββββββββββββββββ
git clone <your-repo-url>
cd ollivecp .env.example .envAdd your OpenRouter API key:
OPENROUTER_API_KEY=sk-or-...docker compose up --build| Service | URL |
|---|---|
| π Web UI | http://localhost:3000 |
| π€ Chatbot API | http://localhost:8000 |
| π₯ Ingestion API | http://localhost:8001 |
ollive/
βββ apps/
β βββ web/ # Next.js frontend
β
βββ packages/
β βββ ollive-sdk/ # Inference logging SDK
β βββ db/ # Shared PostgreSQL pool
β
βββ services/
β βββ chatbot/ # Express chatbot service
β βββ ingestion/ # Ingestion API + worker
β
βββ infra/
β βββ init.sql # Database schema
β
βββ docker-compose.yml
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/conversations |
List conversations |
| POST | /api/conversations |
Create conversation |
| GET | /api/conversations/:id |
Resume conversation |
| POST | /api/conversations/:id/cancel |
Cancel conversation |
| POST | /api/chat/:id |
Send message |
| POST | /api/chat/:id/cancel-stream |
Abort active stream |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/logs |
Ingest inference log |
| GET | /api/v1/metrics/summary |
Dashboard metrics |
Authentication header:
X-API-Key: <INGESTION_API_KEY>- Chatbot wraps every LLM call with
OlliveLogger.trace() - SDK captures metadata and inference metrics
- Sensitive previews are redacted
- SDK asynchronously sends logs to ingestion API
- Ingestion service validates payloads with Zod
- Logs are appended to Redis Streams
- Worker consumes logs through consumer groups
- Logs are persisted into PostgreSQL
- Metrics are aggregated into hourly rollups
- Dashboard queries aggregated metrics
Separating logging from the inference request path prevents database writes from increasing chatbot latency.
Benefits:
- Faster user responses
- Better burst handling
- Horizontally scalable ingestion
- Fault isolation
import { OlliveLogger } from "@ollive/sdk";
const logger = new OlliveLogger({
ingestionUrl: process.env.INGESTION_URL,
apiKey: process.env.INGESTION_API_KEY,
});
await logger.trace(
{
sessionId: "sess-123",
provider: "openai",
model: "gpt-4o-mini",
conversationId: "...",
requestInput: userPrompt,
isStreaming: true,
},
async (ctx) => {
const result = await callLLM();
ctx.response = result.text;
ctx.totalTokens = result.tokens;
return result;
}
);Logs are sent immediately without batching.
SDK logging failures never break chat functionality.
Sensitive data is masked before transmission:
- Emails
- Phone numbers
- Credit cards
- SSNs
- API keys
The system does not store full prompts or responses.
Only:
- Truncated previews
- Metadata
- Latency
- Token counts
- Errors
Stores top-level chat sessions.
Status values:
activecancelledcompleted
Normalized conversation history.
Append-only observability events.
Stores:
- Metadata
- Provider/model info
- Timing
- Preview snippets
- Errors
Pre-aggregated metrics rollups for fast dashboards.
Redis Streams acts as the event bus between:
| Component | Responsibility |
|---|---|
| π₯ Ingestion API | Produces events |
| π· Worker | Consumes and persists events |
Consumer groups enable:
- Horizontal scaling
- Reliable delivery
- Failure recovery
All LLM traffic routes through OpenRouter using the OpenAI-compatible Chat Completions API.
Supported providers include:
- OpenAI
- Anthropic
- DeepSeek
- Grok
- Meta Llama
Example model IDs:
openai/gpt-4o-mini
anthropic/claude-sonnet-4
Dashboard visualizes:
- β‘ Latency
- π Request throughput
- β Error rate
- πͺ Token usage
- π Provider distribution
Built with:
- Recharts
- Tailwind CSS
- Next.js
Stateless and horizontally scalable.
Scale independently through Redis consumer groups.
Potential future optimizations:
- Read replicas
- Time partitioning
- Materialized views
Streams can be trimmed with MAXLEN to prevent memory growth.
| Failure | Behavior |
|---|---|
| Ingestion API down | SDK drops logs silently |
| Redis down | Ingestion returns 503 |
| Worker crash | Messages re-delivered |
| Invalid payload | Rejected by validation |
| LLM provider error | Logged with error status |
| User cancellation | Logged as cancelled |
- π Ingestion API protected with
X-API-Key - π CORS restricted to frontend origin
- π‘οΈ Secrets stored in
.envor Kubernetes Secrets - π Sensitive previews redacted before persistence
- Start stack with Docker Compose
- Open chat UI
- Send streaming prompts
- Observe metrics dashboard populate
- Cancel/resume conversations
- Watch logs flow through ingestion pipeline