Skip to content

iitianpushkar/ollive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Ollive β€” LLM Inference Logging Platform

A full-stack TypeScript platform for chat, LLM inference observability, and metrics dashboards.

Built as an end-to-end inference logging and ingestion system with:

  • Multi-turn chat
  • Streaming responses
  • Observability SDK
  • Event-driven ingestion pipeline
  • Real-time metrics dashboards
  • Multi-provider LLM support

TypeScript Next JS Express Postgres Redis Docker


✨ Stack

  • Frontend: Next.js 15 + Tailwind CSS
  • Backend: Express.js
  • Database: PostgreSQL
  • Queue/Event Bus: Redis Streams
  • Language: TypeScript
  • Containerization: Docker + Docker Compose
  • SDK: @ollive/sdk

⚑ Features

Feature Implementation
πŸ’¬ Multi-turn chatbot Express chatbot + Next.js UI with context window
⚑ Streaming responses Server-Sent Events (SSE)
πŸ›°οΈ Inference SDK @ollive/sdk trace wrapper
πŸ“Š Observability Logging, latency, tokens, errors
πŸ”’ PII redaction Sensitive preview masking
πŸ“¦ Event-driven ingestion Redis Streams consumer groups
πŸ“ˆ Metrics dashboard Recharts visualizations
🌍 Multi-provider support OpenRouter integration
πŸ”„ Conversation management Resume / cancel chat sessions
🐳 Docker support One-command local setup

πŸ—οΈ Architecture

Ollive is structured as a TypeScript monorepo with three runtime layers.

Layer Stack Responsibility
🌐 Web Next.js 15 Chat UI + Dashboard
πŸ€– Chatbot Express LLM orchestration + streaming
πŸ“₯ Ingestion Express + Worker Logging pipeline + metrics

Shared packages:

  • @ollive/sdk
  • @ollive/db

🧭 System Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     REST/SSE      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Next.js    β”‚ ────────────────► β”‚   Chatbot    β”‚
β”‚   (web)     β”‚                   β”‚   Express    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚ metrics                       β”‚ @ollive/sdk
       β”‚                               β–Ό
       β”‚                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       └──────────────────────► β”‚  Ingestion   β”‚
                                β”‚   Express    β”‚
                                β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ XADD
                                       β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚    Redis     β”‚
                                β”‚   Streams    β”‚
                                β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ XREADGROUP
                                       β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚   Worker     β”‚
                                β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚  PostgreSQL  β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start (Docker)

1️⃣ Clone Repository

git clone <your-repo-url>
cd ollive

2️⃣ Create Environment File

cp .env.example .env

Add your OpenRouter API key:

OPENROUTER_API_KEY=sk-or-...

3️⃣ Start Everything

docker compose up --build

🌐 Services

Service URL
🌐 Web UI http://localhost:3000
πŸ€– Chatbot API http://localhost:8000
πŸ“₯ Ingestion API http://localhost:8001

πŸ“‚ Project Structure

ollive/
β”œβ”€β”€ apps/
β”‚   └── web/                  # Next.js frontend
β”‚
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ ollive-sdk/           # Inference logging SDK
β”‚   └── db/                   # Shared PostgreSQL pool
β”‚
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ chatbot/              # Express chatbot service
β”‚   └── ingestion/            # Ingestion API + worker
β”‚
β”œβ”€β”€ infra/
β”‚   └── init.sql              # Database schema
β”‚
└── docker-compose.yml

πŸ”Œ API Overview

πŸ€– Chatbot API (:8000)

Method Endpoint Description
GET /api/conversations List conversations
POST /api/conversations Create conversation
GET /api/conversations/:id Resume conversation
POST /api/conversations/:id/cancel Cancel conversation
POST /api/chat/:id Send message
POST /api/chat/:id/cancel-stream Abort active stream

πŸ“₯ Ingestion API (:8001)

Method Endpoint Description
POST /api/v1/logs Ingest inference log
GET /api/v1/metrics/summary Dashboard metrics

Authentication header:

X-API-Key: <INGESTION_API_KEY>

πŸ“¦ Ingestion Pipeline

  1. Chatbot wraps every LLM call with OlliveLogger.trace()
  2. SDK captures metadata and inference metrics
  3. Sensitive previews are redacted
  4. SDK asynchronously sends logs to ingestion API
  5. Ingestion service validates payloads with Zod
  6. Logs are appended to Redis Streams
  7. Worker consumes logs through consumer groups
  8. Logs are persisted into PostgreSQL
  9. Metrics are aggregated into hourly rollups
  10. Dashboard queries aggregated metrics

⚑ Why Async Ingestion?

Separating logging from the inference request path prevents database writes from increasing chatbot latency.

Benefits:

  • Faster user responses
  • Better burst handling
  • Horizontally scalable ingestion
  • Fault isolation

πŸ›°οΈ SDK Usage

import { OlliveLogger } from "@ollive/sdk";

const logger = new OlliveLogger({
  ingestionUrl: process.env.INGESTION_URL,
  apiKey: process.env.INGESTION_API_KEY,
});

await logger.trace(
  {
    sessionId: "sess-123",
    provider: "openai",
    model: "gpt-4o-mini",
    conversationId: "...",
    requestInput: userPrompt,
    isStreaming: true,
  },
  async (ctx) => {
    const result = await callLLM();

    ctx.response = result.text;
    ctx.totalTokens = result.tokens;

    return result;
  }
);

πŸ“Š Logging Strategy

⚑ Near Real-Time Logging

Logs are sent immediately without batching.


πŸ›‘οΈ Best-Effort Delivery

SDK logging failures never break chat functionality.


πŸ”’ PII Redaction

Sensitive data is masked before transmission:

  • Emails
  • Phone numbers
  • Credit cards
  • SSNs
  • API keys

🧾 Preview-Only Storage

The system does not store full prompts or responses.

Only:

  • Truncated previews
  • Metadata
  • Latency
  • Token counts
  • Errors

πŸ—„οΈ Database Schema

conversations

Stores top-level chat sessions.

Status values:

  • active
  • cancelled
  • completed

messages

Normalized conversation history.


inference_logs

Append-only observability events.

Stores:

  • Metadata
  • Provider/model info
  • Timing
  • Preview snippets
  • Errors

metrics_hourly

Pre-aggregated metrics rollups for fast dashboards.


πŸ“¦ Event-Driven Architecture

Redis Streams acts as the event bus between:

Component Responsibility
πŸ“₯ Ingestion API Produces events
πŸ‘· Worker Consumes and persists events

Consumer groups enable:

  • Horizontal scaling
  • Reliable delivery
  • Failure recovery

🌍 Multi-Provider LLM Support

All LLM traffic routes through OpenRouter using the OpenAI-compatible Chat Completions API.

Supported providers include:

  • OpenAI
  • Anthropic
  • Google
  • DeepSeek
  • Grok
  • Meta Llama

Example model IDs:

openai/gpt-4o-mini
anthropic/claude-sonnet-4

πŸ“ˆ Metrics Dashboard

Dashboard visualizes:

  • ⚑ Latency
  • πŸ“ˆ Request throughput
  • ❌ Error rate
  • πŸͺ™ Token usage
  • 🌍 Provider distribution

Built with:

  • Recharts
  • Tailwind CSS
  • Next.js

πŸ“ˆ Scaling Considerations

πŸ“₯ Ingestion API

Stateless and horizontally scalable.


πŸ‘· Workers

Scale independently through Redis consumer groups.


πŸ—„οΈ PostgreSQL

Potential future optimizations:

  • Read replicas
  • Time partitioning
  • Materialized views

⚑ Redis

Streams can be trimmed with MAXLEN to prevent memory growth.


πŸ›‘οΈ Failure Handling

Failure Behavior
Ingestion API down SDK drops logs silently
Redis down Ingestion returns 503
Worker crash Messages re-delivered
Invalid payload Rejected by validation
LLM provider error Logged with error status
User cancellation Logged as cancelled

πŸ” Security Notes

  • πŸ”‘ Ingestion API protected with X-API-Key
  • 🌐 CORS restricted to frontend origin
  • πŸ›‘οΈ Secrets stored in .env or Kubernetes Secrets
  • πŸ”’ Sensitive previews redacted before persistence

🎬 Demo Flow

  1. Start stack with Docker Compose
  2. Open chat UI
  3. Send streaming prompts
  4. Observe metrics dashboard populate
  5. Cancel/resume conversations
  6. Watch logs flow through ingestion pipeline

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors