BlacklistedAIProxy — Plugin Concepts, Developer Guide & Marketplace Documentation

Version: 1.0.0
Last Updated: 2026-04-20
Status: Beta — Plugin System v1

Plugin Architecture Overview
Plugin Manifest Reference
Plugin Developer Guide
Security Checklist
Performance Requirements & SLOs
Testing Requirements
Plugin Information Form (Template)
Marketplace Catalog Schema Reference
Top 5 Recommended Plugin Concepts
#1 Recommended Next Plugin: Multi-Model Ensemble Synthesizer
Changelog Policy

1. Plugin Architecture Overview

BlacklistedAIProxy's plugin system is a lifecycle-driven, hook-based middleware architecture. Every plugin is a plain JavaScript ES module that exports a default object conforming to the Plugin interface. Plugins are auto-discovered from the src/plugins/ directory, registered with the PluginManager, and given the opportunity to participate in the full request/response lifecycle.

1.1 Request Flow (Annotated)

HTTP Request
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│  Master Router (master.js / common.js)                          │
│                                                                  │
│  ① Auth Plugins     (type: 'auth',   priority: ~9999)          │
│     └─ authenticate() → { handled, authorized, error }         │
│                                                                  │
│  ② Middleware Plugins (type: 'middleware', priority: 50–8000)  │
│     └─ middleware() → { handled: true }  ← SHORT-CIRCUIT       │
│        or { handled: false }             ← CONTINUE            │
│                                                                  │
│  ③ Provider routing (model selection, protocol conversion)     │
│                                                                  │
│  ④ AI Provider Call (HTTP → Gemini / Claude / OpenAI / etc.)  │
│                                                                  │
│  ⑤ Response handling                                           │
│     ├─ Streaming:   onStreamChunk() hook                        │
│     └─ Unary:       onUnaryResponse() hook                      │
│                                                                  │
│  ⑥ Content Generated:  onContentGenerated() hook               │
└─────────────────────────────────────────────────────────────────┘
    │
    ▼
HTTP Response

1.2 Plugin Priority

Plugins are sorted by _priority (ascending — lower number runs first). Recommended ranges:

Range	Typical Use Case
1–49	Reserved for system internals
50–99	Token optimization, caching
100–499	Response transformation
500–2999	Analytics, monitoring, logging
3000–7999	General middleware
8000–8999	Budget / quota enforcement
9000–9499	Security guards (rate-limit, PII)
9500–9998	Custom auth middleware
9999+	Core auth (default-auth, reserved)

1.3 Plugin Discovery

The PluginManager scans src/plugins/*/index.js at startup and dynamically imports each plugin. The plugin's name field is used as its identifier. Plugins with no index.js are silently skipped. A configs/plugins.json file persists enabled/disabled state across restarts.

1.4 Plugin Types

Type	Description
`_builtin`	Ships with the project; always registered; managed via config
`auth`	Participates in the authentication flow via `authenticate()`
`middleware`	Participates in the middleware chain via `middleware()`
(default)	Hooks-only plugin; no middleware participation

2. Plugin Manifest Reference

The default export of index.js must be an object with the following properties:

2.1 Required Fields

export default {
    name:        'my-plugin',     // string — Unique slug. Matches directory name.
    version:     '1.0.0',         // string — SemVer version string.
    description: 'One-line desc', // string — Shown in plugin manager UI.
}

2.2 Optional Configuration Fields

{
    type:       'middleware',  // string — 'middleware' | 'auth' | '_builtin' | undefined
    _builtin:   false,         // boolean — Mark as built-in (non-removable)
    _priority:  100,           // number  — Execution order (lower = earlier)
}

2.3 Lifecycle Hooks

{
    /**
     * Called once at server startup when the plugin is enabled.
     * Use for: opening database connections, loading config, starting timers.
     * Throwing from init() will mark the plugin as disabled.
     *
     * @param {Object} config — The full server CONFIG object
     * @returns {Promise<void>}
     */
    async init(config) {},

    /**
     * Called when the plugin is being shut down (server exit or restart).
     * Use for: closing connections, flushing buffers, persisting state.
     *
     * @returns {Promise<void>}
     */
    async destroy() {},
}

2.4 Request Middleware

{
    /**
     * Runs on EVERY incoming HTTP request, BEFORE the provider call.
     * Return { handled: true } to short-circuit all further processing.
     * Return { handled: false } to continue the chain.
     *
     * NEVER throw from middleware. Catch all exceptions internally.
     *
     * @param {IncomingMessage}  req         — Node.js HTTP request
     * @param {ServerResponse}   res         — Node.js HTTP response
     * @param {URL}              requestUrl  — Parsed request URL
     * @param {Object}           config      — Full server CONFIG (mutable)
     * @returns {Promise<{ handled: boolean }>}
     */
    async middleware(req, res, requestUrl, config) {
        // ...
        return { handled: false };
    },
}

2.5 Authentication (type: 'auth' only)

{
    /**
     * Called for type='auth' plugins during the authentication phase.
     *
     * @returns {Promise<{
     *   handled:    boolean,         — true if this plugin handled auth
     *   authorized: boolean | null,  — true = pass, false = reject, null = abstain
     *   error?:     Object,          — Error details if authorized=false
     * }>}
     */
    async authenticate(req, res, requestUrl, config) {},
}

2.6 Hooks Object

{
    hooks: {
        /**
         * Called after a non-streaming (unary) response is received from
         * the AI provider. Use for: response caching, usage accounting.
         *
         * @param {Object} ctx
         * @param {string}  ctx.requestId      — Plugin request ID
         * @param {string}  ctx.model          — Model identifier
         * @param {Object}  ctx.nativeResponse — Raw provider response
         * @param {Object}  ctx.clientResponse — Converted client response
         */
        async onUnaryResponse(ctx) {},

        /**
         * Called for each SSE chunk in a streaming response.
         * Do NOT block this hook — it runs in the critical streaming path.
         */
        async onStreamChunk(ctx) {},

        /**
         * Called once after the full response has been delivered to the client.
         * Use for: analytics, logging, cleanup.
         * Available on ctx: model, originalRequestBody, processedRequestBody,
         *   usage (promptTokens, completionTokens), _pluginRequestId.
         */
        async onContentGenerated(ctx) {},

        /**
         * Called at the start of every request, before any processing.
         * Use for: request logging, rate-limit pre-checks.
         */
        async onBeforeRequest(req, config) {},

        /**
         * Called after the response has been sent to the client.
         */
        async onAfterResponse(req, res, config) {},
    },
}

2.7 Routes

{
    /**
     * Plugin-owned REST routes. Each route is checked AFTER the main
     * UI management routes, so prefix all paths with /api/<plugin-name>
     * to avoid conflicts.
     *
     * The handler function signature matches handleUIApiRequests:
     *   (method, path, req, res, config) => Promise<boolean>
     */
    routes: [
        {
            method:  'GET',                      // HTTP method or '*' for all
            path:    '/api/my-plugin/stats',     // Exact path OR path prefix
            handler: myHandlerFunction,
        },
    ],
}

2.8 Static Files

{
    /**
     * Static HTML/CSS/JS files the plugin wants to serve from /static/.
     * These must already exist at static/<filename>.
     * Specified as relative paths from the static/ directory.
     */
    staticPaths: ['my-plugin.html', 'my-plugin.css'],
}

3. Plugin Developer Guide

3.1 Project Structure

src/plugins/
└── my-plugin/
    ├── index.js          ← Plugin entry point (required)
    ├── api-routes.js     ← REST route handlers
    ├── [module].js       ← One file per logical component
    └── README.md         ← Plugin-specific documentation (optional)

static/
├── my-plugin.html        ← Plugin dashboard page (optional)
└── components/
    └── section-*.html    ← If plugin adds a main section (optional)

configs/
└── my-plugin.json        ← Persisted plugin configuration (auto-created)

3.2 Minimal Plugin Template

// src/plugins/my-plugin/index.js
import logger from '../../utils/logger.js';

export default {
    name:        'my-plugin',
    version:     '1.0.0',
    description: 'Brief description of what this plugin does.',
    type:        '_builtin',
    _builtin:    true,
    _priority:   500,

    async init(config) {
        logger.info('[My Plugin] Initialized');
    },

    async destroy() {
        logger.info('[My Plugin] Destroyed');
    },

    async middleware(req, res, requestUrl, config) {
        try {
            // Your middleware logic here.
            // Return { handled: true } to stop the chain.
            // Return { handled: false } to continue.
            return { handled: false };
        } catch (err) {
            logger.error('[My Plugin] Middleware error (non-fatal):', err.message);
            return { handled: false };
        }
    },

    hooks: {
        async onContentGenerated(ctx) {
            try {
                // Post-request analytics, logging, etc.
            } catch { /* Always swallow errors in hooks */ }
        },
    },
};

3.3 Body Access Pattern

The request body is a Node.js readable stream. Two patterns exist:

Pattern A — Read in middleware (plugin controls the body):

async middleware(req, res, requestUrl, config) {
    // Read stream and cache in req._rawBody for downstream consumers
    const chunks = [];
    await new Promise((resolve, reject) => {
        req.on('data', c => chunks.push(c));
        req.on('end',  resolve);
        req.on('error', reject);
    });
    req._rawBody = Buffer.concat(chunks);
    const body   = JSON.parse(req._rawBody.toString('utf8'));
    // Modify body, write back:
    req._rawBody = Buffer.from(JSON.stringify(modifiedBody), 'utf8');
    return { handled: false };
}

Pattern B — Use getRequestBody() (downstream, no stream interception):

// Available in route handlers; automatically picks up req._rawBody if set.
import { getRequestBody } from '../../utils/common.js';
const body = await getRequestBody(req);

Rule: If your middleware reads the body stream, you MUST store the bytes in req._rawBody. Failing to do so will cause downstream getRequestBody() calls to receive an empty body (EOF on already-consumed stream).

3.4 Configuration Persistence

import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
import path from 'path';

const CONFIG_FILE = path.join(process.cwd(), 'configs', 'my-plugin.json');
const DEFAULTS    = { enabled: true, someOption: 'value' };

function loadConfig() {
    try {
        if (existsSync(CONFIG_FILE)) {
            return { ...DEFAULTS, ...JSON.parse(readFileSync(CONFIG_FILE, 'utf8')) };
        }
    } catch { /* use defaults */ }
    return { ...DEFAULTS };
}

function saveConfig(cfg) {
    try {
        const dir = path.dirname(CONFIG_FILE);
        if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
        writeFileSync(CONFIG_FILE, JSON.stringify(cfg, null, 2), 'utf8');
    } catch { /* silent */ }
}

3.5 Error Isolation Contract

Every plugin MUST adhere to the following error isolation contract:

Never throw from middleware() — wrap all logic in try/catch.
Never throw from hooks — swallow all exceptions silently or with a warning log.
Never throw from init() unless you intend to disable the plugin.
Always return from middleware() — returning undefined or null will break the chain.
Cap memory usage — use bounded data structures (LRU maps, ring buffers). Unbounded growth causes OOM crashes.
Use non-blocking I/O — never use synchronous filesystem operations on the hot path.

3.6 REST API Best Practices

// Standard JSON response helper
function sendJson(res, status, data) {
    res.writeHead(status, {
        'Content-Type':                'application/json; charset=utf-8',
        'Access-Control-Allow-Origin': '*',
    });
    res.end(JSON.stringify(data));
}

// Standard success response
sendJson(res, 200, { success: true, data: { ... } });

// Standard error response
sendJson(res, 400, { success: false, error: { message: 'Description', code: 'ERROR_CODE' } });

HTTP status codes to use:

200 — Success
400 — Bad request (invalid parameters)
401 — Unauthorized (missing or invalid credentials)
404 — Route not found (within your plugin's namespace)
422 — Validation error (well-formed but semantically invalid)
429 — Rate limited or quota exceeded
500 — Internal plugin error
503 — Plugin not available / not initialized

3.7 Authentication in Route Handlers

All plugin routes that expose sensitive data or allow configuration changes MUST validate authentication:

import { checkAuth } from '../../ui-modules/auth.js';
import { isAuthorized } from '../../utils/common.js';

async function isAuthed(req, config) {
    try {
        if (await checkAuth(req)) return true;
        if (config?.REQUIRED_API_KEY) {
            const url = new URL(req.url, `http://${req.headers.host || 'localhost'}`);
            return isAuthorized(req, url, config.REQUIRED_API_KEY);
        }
        return false;
    } catch { return false; }
}

4. Security Checklist

Every plugin submitted to the marketplace must pass the following checks before being classified as verified or official. For community plugins, this checklist is self-assessed.

4.1 Authentication & Authorization

All management endpoints check admin authentication via checkAuth() or API key validation
No endpoints expose secrets, credentials, or internal state without authentication
User-controlled input is never used to construct file paths without sanitization
No plugin accepts arbitrary JavaScript execution from user input

4.2 Input Validation

All numeric config inputs are bounded (min/max enforced)
String inputs are length-limited
JSON body parsing uses try/catch; malformed input returns 400, not 500
Regular expressions used for PII/pattern matching are tested for ReDoS (catastrophic backtracking)

4.3 Memory Safety

All in-memory data structures have a defined maximum size
No unbounded Maps, Arrays, or Sets that grow with request traffic
Timer intervals are unref()-ed where appropriate to not prevent process exit
Pending/in-flight state maps are cleaned up in all code paths (including errors)

4.4 Dependency Safety

No external npm dependencies (preferred — use Node.js built-ins)
If external deps are required, they are pinned to exact versions
All external deps have been checked against the GitHub Advisory Database
No dependencies with known RCE, injection, or prototype pollution vulnerabilities

4.5 Data Handling

No sensitive data (API keys, tokens, PII) is written to log files
No sensitive data is stored in plaintext beyond what is strictly necessary
Persisted files use atomic write patterns (temp file + rename) where data loss would be harmful
No user data is sent to external services without explicit opt-in configuration

4.6 Network Safety

Outbound HTTP requests use a timeout (recommended: 5 seconds)
Outbound URLs are validated before use (must start with https://)
No SSRF: user-supplied URLs are not proxied without restriction
Outbound errors are caught and never propagate to the response

4.7 Rate Limiting of Side Effects

Expensive operations (database writes, webhook calls) are debounced or rate-limited
Log output is rate-limited for high-frequency events (no log flooding)
Cache writes are bounded; cache reads are O(1) or O(log n) worst case

5. Performance Requirements & SLOs

All plugins are expected to meet the following service-level objectives. Violation of these SLOs may cause the plugin to be disabled automatically in a future version of the plugin manager.

5.1 Latency Budget

Hook / Phase	P50 Target	P99 Limit	Notes
`middleware()`	< 1 ms	< 5 ms	Excludes first-time body read from stream
`authenticate()`	< 2 ms	< 10 ms
`onUnaryResponse()`	< 0.5 ms	< 2 ms
`onStreamChunk()`	< 0.1 ms	< 0.5 ms	Critical path — absolute minimum
`onContentGenerated()`	< 1 ms	< 5 ms
`init()`	< 500 ms	< 2000 ms	One-time cost; no hard limit but be sane
Route handlers	< 50 ms	< 500 ms	Excludes external I/O wait time

5.2 Memory Budget

Category	Limit	Notes
In-memory cache	≤ 200 MB	Use LRU eviction; expose size config to users
In-flight state	≤ 10 MB	Pending response maps etc.
Plugin code + data	≤ 50 MB	Including loaded modules
Single request	≤ 5 MB	Per-request allocations must be GC-able

5.3 CPU Budget

No synchronous CPU work > 5 ms on the main event loop thread
Heavy computation (ML inference, regex over 1 MB+ strings) must be offloaded to a Worker thread
Regex patterns must complete in < 1 ms on inputs up to 100 KB

6. Testing Requirements

6.1 Minimum Test Coverage

Every plugin must include tests in src/plugins/<plugin-name>/tests/ or equivalent. Minimum required test scenarios:

Unit Tests (required):

init() succeeds with minimal config
middleware() returns { handled: false } for non-matching requests
middleware() returns { handled: true } when short-circuiting (e.g., cache hit, rate limit)
middleware() never throws; returns { handled: false } on internal errors
All public methods of internal modules (cache, optimizer, etc.)
Config load with missing file returns defaults
Config save/load round-trip

Integration Tests (required for marketplace listing):

Full request flow: request → middleware → provider call → hook → response
Concurrent request handling (at least 10 simultaneous requests)
Config update via REST API persists across a simulated restart
Auth rejection on protected endpoints (missing token → 401)

Edge Case Tests (strongly recommended):

Empty messages array
Messages with null/undefined content
Very large body (> 1 MB)
Invalid JSON body
Malformed API keys/tokens
Extreme config values (maxSize=1, ttl=0, limit=1)

6.2 Test File Structure

src/plugins/my-plugin/
├── index.js
├── [module].js
└── tests/
    ├── unit.test.js       ← Module-level unit tests
    ├── integration.test.js ← End-to-end plugin tests
    └── fixtures/
        ├── messages.json  ← Test message arrays
        └── responses.json ← Cached response fixtures

7. Plugin Information Form (Template)

When submitting a plugin for the marketplace, complete the following information form. This becomes the plugin's marketplace-catalog.js entry and README.

════════════════════════════════════════════════════════════════
  BLACKLISTEDAIPROXY PLUGIN SUBMISSION FORM
════════════════════════════════════════════════════════════════

SECTION A — IDENTIFICATION
─────────────────────────
Plugin ID (unique slug, lowercase, hyphenated):
  _______________________________________________

Display Name:
  _______________________________________________

Version (SemVer):
  _______________________________________________

Author Name:
  _______________________________________________

Author URL / GitHub Profile:
  _______________________________________________

Repository URL (source code):
  _______________________________________________

License (SPDX identifier, e.g. MIT, Apache-2.0, GPL-3.0):
  _______________________________________________

Plugin Size Estimate (e.g. ~15 KB):
  _______________________________________________

Minimum BlacklistedAIProxy Core Version:
  _______________________________________________

────────────────────────────────────────────────────────────────
SECTION B — CLASSIFICATION
─────────────────────────
Primary Category (circle one):
  security | optimization | analytics | utilities | productivity | ai-enhancement

Sub-Categories (list up to 3):
  _______________________________________________

Trust Tier (circle one, self-assessed for community submissions):
  official | verified | community

Plugin Type (circle one):
  middleware | auth | hooks-only | ui-extension

Capabilities (check all that apply):
  [ ] middleware()        — Intercepts AI requests
  [ ] authenticate()      — Participates in auth flow
  [ ] hooks               — Subscribes to lifecycle events
  [ ] routes              — Exposes REST API endpoints
  [ ] static files        — Provides dashboard HTML pages

Tags (comma-separated, lowercase):
  _______________________________________________

────────────────────────────────────────────────────────────────
SECTION C — DESCRIPTION
───────────────────────
One-Line Description (< 120 characters):
  _______________________________________________

Full Markdown Description (include: what it does, why it matters,
  key features, configuration options, usage examples):
  ┌─────────────────────────────────────────────────┐
  │                                                 │
  │                                                 │
  └─────────────────────────────────────────────────┘

Dashboard URL (if plugin provides its own page, relative path):
  _______________________________________________

Documentation URL (external docs, if any):
  _______________________________________________

────────────────────────────────────────────────────────────────
SECTION D — TECHNICAL DETAILS
─────────────────────────────
Execution Priority (1–9999, see priority table):
  _______________________________________________

Config File Path (e.g. configs/my-plugin.json):
  _______________________________________________

Is Restart Required to Toggle?: [ ] Yes  [x] No

Is the Plugin Configurable via REST API?:  [ ] Yes  [ ] No

Does the Plugin Store Data Persistently?:  [ ] Yes  [ ] No
  If yes, where?: ___________________________________

Does the Plugin Make Outbound HTTP Requests?:  [ ] Yes  [ ] No
  If yes, to which domains?: ________________________

Does the Plugin Modify Request Bodies?:  [ ] Yes  [ ] No
  If yes, explain the modification: ________________

Does the Plugin Cache Responses?:  [ ] Yes  [ ] No
  If yes, what is the max memory usage?: ____________

────────────────────────────────────────────────────────────────
SECTION E — SECURITY SELF-ASSESSMENT
────────────────────────────────────
(Complete the Security Checklist from Section 4 and attach it)

Known Limitations or Caveats:
  _______________________________________________

────────────────────────────────────────────────────────────────
SECTION F — CHANGELOG
─────────────────────
v1.0.0 (YYYY-MM-DD):
  _______________________________________________

════════════════════════════════════════════════════════════════

8. Marketplace Catalog Schema Reference

Each plugin entry in src/core/marketplace-catalog.js follows this schema:

interface PluginCatalogEntry {
    // Required
    id:             string;    // Unique slug matching directory name
    name:           string;    // Human-readable display name
    version:        string;    // SemVer version
    author:         { name: string; url: string };
    category:       string;    // Primary category slug
    trustTier:      'official' | 'verified' | 'community';
    description:    string;    // < 120 chars, one-liner
    longDescription: string;   // Markdown-formatted full description
    icon:           string;    // FontAwesome class (e.g. 'fa-bolt')
    iconColor:      string;    // CSS color string (e.g. '#f59e0b')
    capabilities:   string[];  // ['middleware', 'routes', 'hooks', 'static']
    license:        string;    // SPDX identifier
    changelog:      Array<{ version: string; date: string; notes: string }>;

    // Optional
    subCategories?: string[];
    rating?:        { score: number; count: number };  // 0.0–5.0
    installs?:      number;
    featured?:      boolean;
    tags?:          string[];
    minCoreVersion?: string;
    size?:          string;    // e.g. '~15 KB'
    repository?:    string;
    documentationUrl?: string | null;
    dashboardUrl?:  string | null;   // Relative URL to plugin page
    configurable?:  boolean;
    restartRequired?: boolean;
    screenshots?:   string[];

    // Runtime (populated by marketplace-api.js, NOT stored in catalog)
    installed?:     boolean;
    enabled?:       boolean;
}

9. Top 5 Recommended Plugin Concepts

Selection Criteria: Every concept below was evaluated against BlacklistedAIProxy's unique value proposition: all major commercial LLMs are 100% free to users, and built-in provider routing already selects the best-fit model by request context. A plugin concept is disqualified if it:

Duplicates core routing logic (routing already exists as a built-in feature)

Frames its value around "cheaper" models (meaningless when everything is free)

Introduces quotas, token budgets, or resource rationing (there is nothing to ration — access is unlimited)

Concepts are ranked by net-new capability they add that does not exist anywhere in the current codebase.

The following five plugin concepts were selected by analyzing:

BlacklistedAIProxy's unique free-access-to-all-LLMs architecture
The top 50 AI infrastructure repositories on GitHub by stars
NadirClaw's architecture (prompt caching, token optimization, wizard UX)
Real user pain points reported in LiteLLM, OpenRouter, and similar proxy issue trackers
Feature gaps that cannot be solved by existing core features

Each concept combines best practices from multiple open-source projects and adds capability that does not exist anywhere else in the current codebase.

Concept #1 — Multi-Model Ensemble Synthesizer ⭐ TOP PICK

Category: ai-enhancement
Priority Slot: 300
Estimated Impact: Measurable quality uplift on every request; unique killer feature only possible because all models are free

The Core Insight: Every other AI proxy on the market forces users to pick one model per request — because every call costs money. BlacklistedAIProxy is the only proxy where sending a prompt to 3 or 4 frontier models simultaneously costs the user exactly $0.00. This plugin turns that structural advantage into a first-class feature: fan out one prompt to N models, collect all responses, synthesize or vote on the best answer, and return a single high-confidence reply.

What it does:

Intercepts any chat/completion request before it reaches the provider
Fans the prompt out to a configurable set of models in parallel (e.g. GPT-4o + Claude 3.5 Sonnet + Gemini 1.5 Pro)
Collects all N responses with latency and quality metadata
Runs a lightweight synthesis step (configurable):
- vote — majority consensus on factual answers (most common answer wins)
- best — return the response that scores highest on a heuristic rubric (length, coherence, format match)
- merge — call a lightweight "judge" model to synthesize all N answers into one refined response
- all — return all responses as a structured JSON array (power-user / comparison mode)
Logs per-model latency and quality score for dashboard visualization

Key Open-Source References:

RouteLLM (lm-sys/routellm) — multi-model dispatch and quality evaluation patterns
LiteLLM (BerriAI/litellm) — parallel provider fan-out, response aggregation
Chatbot Arena (lm-sys/FastChat) — model comparison and ELO-based quality ranking methodology
OpenAI Evals (openai/evals) — rubric-based response scoring patterns
LangChain MapReduceDocumentsChain — fan-out / reduce synthesis pattern
Guardrails AI (guardrails-ai/guardrails) — structured response validation before synthesis
Portkey AI Gateway (Portkey-AI/gateway) — multi-provider parallel request patterns

Technical Implementation:

Fan-out middleware (runs at priority 300, before provider dispatch):
- Clones request body N times
- Fires N provider calls in parallel using Promise.allSettled
- Respects existing per-provider auth from config (no duplicate auth setup needed)
Synthesis engine (synthesizer.js):
- vote: tokenize each response → extract final answer → find most common → return
- best: score each response on 4 heuristics (relevance keywords, length normalcy, markdown structure if requested, no refusal markers) → return highest scorer
- merge: construct a meta-prompt like "Given these N answers: [...] provide the best combined answer" and call the fastest available model as judge
- all: serialize to { responses: [{model, text, latencyMs, score}] } JSON
Streaming support: In best and vote modes, buffer all responses then stream the winner back. In all mode, stream a newline-delimited JSON array
Timeout contract: each provider call has a hard 15 s timeout; any that exceed it are dropped from synthesis (minimum 1 required to respond)
Dashboard panel: static/ensemble.html
- Model selector checkboxes (which models participate)
- Per-model average latency, win rate, agreement rate
- Synthesis mode dropdown
- Live request log showing fan-out results per request

Config Schema:

{
  "enabled": false,
  "models": ["gpt-4o", "claude-3-5-sonnet-20241022", "gemini-1.5-pro"],
  "synthesisMode": "best",
  "timeoutMs": 15000,
  "minResponses": 1,
  "judgeModel": "gemini-2.0-flash",
  "allowClientOverride": true,
  "streamingMode": "winner"
}

Why this is the right #1 pick — and why the previous suggestion was wrong:

The previously suggested "Smart Model Router" was disqualified on two grounds:

Duplicate functionality — BlacklistedAIProxy's core provider routing system already selects the best-fit LLM based on request context. A plugin that does the same thing adds zero net value.
Invalid value framing — routing to "cheaper" models is meaningless when every model is free. The entire premise of cost savings collapses.

The Ensemble Synthesizer is the exact opposite:

It is only possible because all models are free (no one else can afford to call 4 frontier models per request)
It adds a capability that does not exist in the core (multi-model fan-out + synthesis)
It turns the product's core value proposition into a tangible, demonstrable feature
Users can literally see 4 frontier AI responses side-by-side for one query

Why it's the #1 recommendation: See Section 10.

Concept #2 — Semantic Cache

Category: optimization
Priority Slot: 60 (runs before token-optimizer)
Estimated Impact: Latency reduction 80–95% for semantically equivalent repeat queries; measurably faster perceived response times for end users asking similar questions

What it does: Extends the exact-match prompt cache (token-optimizer) with fuzzy similarity matching. Uses sentence embeddings to find cached responses that are semantically equivalent, even when the exact wording differs.

Key Open-Source References:

GPTCache (zilliztech/GPTCache) — semantic cache architecture, cosine similarity matching
semantic-router (aurelio-labs/semantic-router) — fast semantic routing
LangChain SemanticSimilarityExampleSelector — embedding-based matching
Chroma (chroma-core/chroma) — vector database for embedding storage
Nomic Embed (nomic-ai/nomic-embed-text) — efficient local embeddings

Technical Implementation:

Embed each request's last user message using a lightweight local model (e.g., all-MiniLM-L6-v2 via ONNX Runtime — no Python required)
Store embeddings + responses in a vector index (in-memory with HNSW)
On each request: compute embedding → search for nearest neighbors
If cosine similarity > threshold (default: 0.92) → return cached response
Dashboard: similarity threshold slider, cache visualizer

Key Complexity: Requires ONNX Runtime or calling an embedding endpoint. Two modes:

local — uses a bundled ONNX model (~100 MB download on first use)
remote — calls a configurable embedding endpoint (e.g., OpenAI /embeddings)

Concept #3 — Conversation Memory Store

Category: ai-enhancement
Priority Slot: 150
Estimated Impact: Enables persistent, cross-session AI memory for all clients

What it does: Injects a persistent memory summary into every AI request, enabling the AI to "remember" facts about each user across sessions without the client needing to manage context.

Key Open-Source References:

MemGPT (cpacker/MemGPT) — hierarchical memory architecture
mem0 (mem0ai/mem0) — user memory extraction and injection API
LangMem (langchain-ai/langmem) — conversation memory distillation
Zep (getzep/zep) — persistent memory service
LangChain ConversationSummaryMemory — summarization-based memory

Technical Implementation:

After each request, extract key facts from the assistant's response using an NLP summarizer
Store per-user memory profiles in a SQLite database keyed by API key or IP
Before each request, inject a <memory> system message with relevant facts
Memory summarization every 10 turns to keep size bounded
User-configurable memory retention period (default: 30 days)
Admin dashboard: view/edit/clear memory per user

Config Schema:

{
  "enabled": false,
  "maxMemoryTokens": 500,
  "retentionDays": 30,
  "summarizeEveryN": 10,
  "keyBy": "api_key"
}

Concept #4 — Response Quality Guard

Category: security + ai-enhancement
Priority Slot: 4000 (runs post-provider in hooks)
Estimated Impact: Reduces hallucination delivery rate by 30–60%

What it does: Evaluates AI responses for quality issues (hallucinations, harmful content, off-topic replies) and automatically retries with a corrective prompt or falls back to a safer model.

Key Open-Source References:

promptfoo (promptfoo-dev/promptfoo) — LLM evaluation framework
G-Eval (microsoft/promptbench) — LLM-based evaluation methodology
DeepEval (confident-ai/deepeval) — assertion-based response testing
RAGAS (explodinggradients/ragas) — RAG answer quality metrics
Guardrails AI (guardrails-ai/guardrails) — output validation

Technical Implementation:

Quality dimensions evaluated (configurable):
- Relevance: does the response address the question?
- Factual consistency: does the response contradict the context?
- Toxicity: does the response contain harmful content?
- Hallucination score: does the response fabricate facts?
Evaluation method: heuristic checks first (< 1 ms), then optional LLM self-eval
Action on low quality: retry (re-request with correction prompt), flag (log only), block (return error)
Max retries: configurable (default: 1)
Quality score logged per request for analysis

Concept #5 — RAG / Knowledge Base Injection

Category: ai-enhancement
Priority Slot: 150 (after memory-store if present, before provider dispatch)
Estimated Impact: Every user prompt answered with grounding from their own documents; eliminates hallucinations on private/domain knowledge; works with all models simultaneously at zero extra cost

The Core Insight: Every LLM in the pool answers questions based on its training data — which ends at a cut-off date and contains nothing about your private documents, internal wikis, product manuals, or codebase. RAG solves this by retrieving the most relevant chunks of your content and injecting them as context before each request. Because all models are free, every user can get RAG-augmented answers from GPT-4o, Claude, and Gemini simultaneously — this is only possible here.

What it does:

Accepts document uploads (PDF, Markdown, plain text, DOCX) via a dashboard panel
Chunks and embeds documents into a local vector store at ingest time
On each request: embeds the user's query → retrieves top-K most relevant chunks
Injects retrieved chunks as a [CONTEXT] block in the system message before the provider call
Works transparently with all models (no per-model configuration needed), including the Ensemble Synthesizer — all N models get the same RAG context injected

Key Open-Source References:

LlamaIndex (run-llama/llama_index) — document ingestion pipeline, node chunking, query engine
LangChain (langchain-ai/langchain) — RAG chain patterns, retriever interface
Chroma (chroma-core/chroma) — embeddable local vector database, persistent collections
Unstructured (Unstructured-IO/unstructured) — PDF/DOCX parsing and chunking strategies
HuggingFace Transformers.js (xenova/transformers.js) — browser/Node-compatible embedding models
FAISS (facebookresearch/faiss) — high-performance approximate nearest-neighbor search
Nomic Embed (nomic-ai/nomic-embed-text) — efficient open-source embedding model

Technical Implementation:

Ingest pipeline (rag-ingest.js):
- Accept file upload via /api/plugins/rag/upload (PDF, .md, .txt, .docx)
- Parse to plain text (pdfjs-dist for PDF, mammoth for DOCX, native for text/markdown)
- Chunk with a sliding window: configurable chunkSize (default: 512 tokens) and chunkOverlap (default: 64 tokens) to preserve cross-chunk context
- Embed each chunk using a local embedding model (Transformers.js, no Python required)
- Persist embeddings + raw chunk text in a SQLite-backed vector store (sqlite-vec or better-sqlite3 with cosine similarity UDF)
- Associate each document with an owner: global (all users) or a specific API key
Retrieval middleware (runs at priority 150):
- Extract the last user message from req.body.messages
- Embed query → nearest-neighbor search → top-K chunks (default K=4)
- If similarity of best match < threshold (default: 0.65), skip injection (no irrelevant context)
- Inject retrieved chunks as a system message:
```
[RETRIEVED CONTEXT — use this to answer the user's question]
--- Chunk from: document-name.pdf ---
<chunk text>
---
```
- Log which document + chunks were injected, similarity scores
Dashboard panel (static/rag.html):
- Document library: upload, list, delete, view chunk count per document
- Per-document scope toggle: available to all users vs. scoped to specific API key
- Query tester: enter a query, see which chunks would be retrieved and their similarity scores
- Retrieval stats: which documents are queried most, average similarity score per document
No quota anywhere: no document count limit, no query limit, no token budget — all models are free, so there is no cost to injecting context into every request

Config Schema:

{
  "enabled": false,
  "chunkSize": 512,
  "chunkOverlap": 64,
  "topK": 4,
  "similarityThreshold": 0.65,
  "embeddingModel": "Xenova/all-MiniLM-L6-v2",
  "maxContextTokens": 2000,
  "injectPosition": "system",
  "allowUserUploads": true
}

Why it fits this product specifically:

Zero cost to inject more context — all models are free, so larger context = better answers at $0
Works with the Ensemble Synthesizer — all 4 models get identical RAG context, and their synthesis becomes grounded in your documents rather than hallucinated
No vendor lock-in: uses local embeddings (Transformers.js) so no external embedding API is needed
Completely additive — no overlap with any existing plugin or core feature

10. #1 Recommended Next Plugin: Multi-Model Ensemble Synthesizer

Recommendation: Build the Multi-Model Ensemble Synthesizer as the next plugin.

Why This Plugin Above All Others?

This plugin exists only because of BlacklistedAIProxy's core promise of free access to all major LLMs. No paid proxy can offer this — they'd be burning money on every request. Here, sending a single prompt to GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro in parallel costs the user $0. That structural advantage is leveraged into a flagship quality feature no competitor can replicate.

Criterion	Score (1–5)	Notes
Net-new capability	★★★★★	Nothing in the core or any existing plugin does this
Unique to this product	★★★★★	Only viable because all models are free
User value / wow factor	★★★★★	Returning one best answer from 4 frontier AIs is visceral
Infrastructure fit	★★★★★	Builds directly on existing provider pool architecture
Implementation complexity	★★★★☆	`Promise.allSettled` fan-out is straightforward
Risk (graceful degradation)	★★★★★	If N models fail, still returns the 1 that responded
Differentiator vs competitors	★★★★★	No other proxy product offers free ensemble synthesis

Total Score: 34/35 — highest of all five concepts.

Specific Features to Prioritize

Fan-out engine — parallel Promise.allSettled across N providers:
- Share existing provider-pool connections (no re-auth overhead)
- Per-provider hard timeout (default: 15 s)
- Drop any provider that times out or errors; never block the response
Synthesis modes (configurable per-request via header or config default):
- best — heuristic scorer (length, coherence, no-refusal markers) → return winner
- vote — tokenize last sentence of each response → majority consensus answer
- merge — construct a judge meta-prompt, call fastest available model as synthesizer
- all — return structured {responses: [...]} JSON array for client-side comparison
Streaming support:
- best and vote: buffer all responses, stream the winner back to client
- all: stream as newline-delimited JSON (NDJSON), each model's response as it arrives
Ensemble Dashboard (static/ensemble.html):
- Model selector: which models participate in ensemble
- Live request log: per-model latency + score for every fanned-out request
- Win-rate chart: which model wins most often across synthesis modes
- Agreement rate: how often all N models give equivalent answers
Override header: X-Ensemble-Mode: all / X-Ensemble-Models: gpt-4o,claude-3-5-sonnet — lets power users customize ensemble per-request without config changes

Key Design Decisions

Does this conflict with the core provider routing?
No. Core routing selects which provider account/endpoint to use for a given model name. The ensemble plugin runs before that, constructing N separate sub-requests each of which then goes through normal provider routing independently.

What if the user explicitly specifies a model in their request?
In best/vote/merge modes, the requested model is always included in the ensemble and its response wins any tie. In all mode, the user gets the requested model's response labeled with its name alongside others.

Should it work with non-chat endpoints (completions, embeddings)?
Phase 1: chat completions only (highest-value, easiest to synthesize). Phase 2: text completions. Phase 3: embeddings (average all N embedding vectors).

Priority slot: 300 — after token-optimizer (50) and universal-guard (9500), before provider dispatch. Streaming fan-out returns through the normal response pipeline.

11. Changelog Policy

All plugins must follow these changelog practices:

Semantic Versioning: MAJOR.MINOR.PATCH
- MAJOR: Breaking changes to API, config schema, or behavior
- MINOR: New features, backward compatible
- PATCH: Bug fixes, performance improvements, security patches

Changelog Entry Format:

{
  "version": "1.2.0",
  "date": "YYYY-MM-DD",
  "notes": "Brief description of all changes in this release."
}

Security Releases: Increment PATCH and prefix notes with [SECURITY].
Deprecation Policy: Deprecated features must be documented for at least one MINOR version before removal.

This document is maintained by the BlacklistedAIProxy core team. For questions, open an issue in the GitHub repository.

FilesExpand file tree

plugin-concepts.md

Latest commit

History

plugin-concepts.md

File metadata and controls

BlacklistedAIProxy — Plugin Concepts, Developer Guide & Marketplace Documentation

Table of Contents

1. Plugin Architecture Overview

1.1 Request Flow (Annotated)

1.2 Plugin Priority

1.3 Plugin Discovery

1.4 Plugin Types

2. Plugin Manifest Reference

2.1 Required Fields

2.2 Optional Configuration Fields

2.3 Lifecycle Hooks

2.4 Request Middleware

2.5 Authentication (type: 'auth' only)

2.6 Hooks Object

2.7 Routes

2.8 Static Files

3. Plugin Developer Guide

3.1 Project Structure

3.2 Minimal Plugin Template

3.3 Body Access Pattern

3.4 Configuration Persistence

3.5 Error Isolation Contract

3.6 REST API Best Practices

3.7 Authentication in Route Handlers

4. Security Checklist

4.1 Authentication & Authorization

4.2 Input Validation

4.3 Memory Safety

4.4 Dependency Safety

4.5 Data Handling

4.6 Network Safety

4.7 Rate Limiting of Side Effects

5. Performance Requirements & SLOs

5.1 Latency Budget

5.2 Memory Budget

5.3 CPU Budget

6. Testing Requirements

6.1 Minimum Test Coverage

6.2 Test File Structure

7. Plugin Information Form (Template)

8. Marketplace Catalog Schema Reference

9. Top 5 Recommended Plugin Concepts

Concept #1 — Multi-Model Ensemble Synthesizer ⭐ TOP PICK

Concept #2 — Semantic Cache

Concept #3 — Conversation Memory Store

Concept #4 — Response Quality Guard

Concept #5 — RAG / Knowledge Base Injection

10. #1 Recommended Next Plugin: Multi-Model Ensemble Synthesizer

Why This Plugin Above All Others?

Specific Features to Prioritize

Key Design Decisions

11. Changelog Policy