Skip to content

Latest commit

 

History

History
189 lines (155 loc) · 9.09 KB

File metadata and controls

189 lines (155 loc) · 9.09 KB

Development

Setup

git clone https://github.com/evertrust/stream-mcp.git
cd stream-mcp
bun install

Node.js ≥ 22.19 (or Bun 1.x+). TypeScript, ESM.

Scripts

Command What it does
bun run dev Run the server from source via tsx
bun run build Bundle to dist/index.js (tsup; inlines knowledge .md)
bun run build:binary Compile a single native standalone binary (dist/stream-mcp)
bun run build:binaries Cross-compile standalone binaries for Linux/macOS/Windows (x64 + arm64)
bun run typecheck tsc --noEmit
bun run lint / bun run lint:fix ESLint
bun run format / bun run format:check Prettier
bun run verify:truth Verify every MCP /api/v1 route reference exists in the Stream backend routes (see below)
bun run validate:ci Run the full local gate: format → lint → typecheck → build → verify:truth → test → scenarios
bun run test Unit tests (vitest)
bun run test:e2e Live e2e tests against a real instance
bun run test:e2e:smoke A small live e2e smoke subset (CI gate when STREAM_E2E_* is set)
bun run test:scenarios (test:llm) Free, deterministic LLM-evaluation tier: in-process tool/resource metadata, a keyword tool-ranker, and (when STREAM_E2E_* is set) grounded checks that the tools return usable output — no model called
bun run test:llm:live Paid, opt-in model-driven smoke: a small Claude model (default Sonnet) drives the MCP and must select the right tool and surface usable output

Project layout

src/
  index.ts            # server entry (stdio); server instructions
  settings.ts         # STREAM_* env -> validated config (zod)
  logging.ts          # JSON logger + MCP logging sink
  auth/               # AuthProvider base, local-account + mTLS providers, factory
  client/             # StreamClient (undici), errors (StreamError), retry
  tools/
    register.ts       # registerTool wrapper (verb -> annotations + isError)
    helpers.ts        # pagination/search/list/mutate envelopes, deleteGuard
    _scaffold.ts      # ConfigSpec CRUD generator (GET-strip-merge-PUT)
    registry.ts       # registerAllTools — single wiring point
    <domain>/         # one folder per domain (x509-ca, crypto, ssh, ...)
    docs/             # search_docs + get_doc
  resources/
    index.ts          # registers stream://knowledge/* resources
    catalog.ts        # resource catalog (imports the .md)
    knowledge/*.md    # embedded knowledge (bundled at build time)
  generated/docs/     # committed verify:truth snapshot (stream-routes.json, mcp-api-paths.json)
scripts/
  verify-truth.ts     # route-truth verifier entry point
  lib/truth.ts        # Play-routes parser + MCP /api/v1 reference scanner
docs/                 # this documentation
tests/
  unit/               # mocked-client unit tests
  e2e/                # live tests (gated by STREAM_E2E_*)

Architecture in brief

  • settings → auth → client → tools. settings.ts parses STREAM_* into a validated config; auth/ builds a provider (local headers or mTLS) consumed by the single StreamClient; domain tools call the client and register through one registerTool wrapper.
  • One client. StreamClient (undici) handles lazy init (whoami + license), bounded JSON, structured StreamError parsing, retry on safe verbs, and maps Stream's HTTP 204 (empty / forbidden) to [] on list endpoints.
  • Name-keyed CRUD is generated by _scaffold.ts from a ConfigSpec. Stream updates are full-replace PUTs keyed by the body's id field, so updates do GET → strip server/asymmetric fields → merge → PUT.
  • Annotations are derived from the tool's verb prefix (read-only / additive / idempotent / destructive); thrown StreamErrors become isError results so the model can self-correct.

Adding a tool

  1. Add it to the relevant src/tools/<domain>/ file (use _scaffold.ts for standard name-keyed CRUD, or registerTool for custom operations).
  2. Map snake_case inputs to the exact camelCase wire fields. For updates, ensure stripFields lists every server-managed and rich-on-read field (e.g. certificate, publicKey).
  3. Add a unit test under tests/unit/<domain>.test.ts with a mocked client.
  4. If it's a new domain, wire its registerXxxTools into src/tools/registry.ts.
  5. bun run validate:ci (or at minimum bun run typecheck && bun run lint && bun run test && bun run build). If the tool reaches a new backend route, also run bun run verify:truth --write against a ../stream checkout and commit the regenerated snapshot.

Testing

Unit tests

Mocked-client tests assert each tool registers, builds the correct route/method/camelCase payload, enforces the delete echo guard, and shapes search queries correctly. A server integration test boots an in-memory MCP session and verifies the full tool/resource surface.

Route truth verification

bun run verify:truth statically checks that every /api/v1 path the MCP references resolves to a real route in the Stream backend. It parses the Play conf/routes of the backend checkout (following nested -> sub-router includes), scans src/ for path references (route consts, routeCollection/ routeItem spec fields, and client.* calls), and fails on any reference with no matching backend route — or a method the route does not expose.

It resolves its source of truth in this order: STREAM_SOURCE_ROOT, a sibling ../stream checkout, then the committed snapshot src/generated/docs/stream-routes.json. CI has no backend checkout, so it runs against the committed snapshot — regenerate it with bun run verify:truth --write (requires ../stream) and commit the result whenever backend routes change. Paths that are genuinely valid but not statically resolvable (e.g. prefix-only constants) go in ALLOWED_UNVERIFIED_PATHS in scripts/lib/truth.ts.

Live e2e tests

Gated by STREAM_E2E_* (loaded from a git-ignored .env.local):

STREAM_E2E_URL=https://stream.qa.example.com
STREAM_E2E_API_ID=...
STREAM_E2E_API_KEY=...
npm run test:e2e

E2e tests skip automatically when the STREAM_E2E_* variables are absent.

LLM smoke tests

Two extra tiers verify the MCP is usable by an LLM (mirroring how the tools are actually consumed):

  • Free / deterministic (npm run test:scenarios, dir tests/llm-evaluation/): boots the MCP in-process and checks the tool/resource surface, a keyword tool-ranker (a $0 proxy for a small model's tool choice), and — when STREAM_E2E_* is set — grounded checks that calling the tools returns usable output (e.g. whoami resolves a principal, list_cas returns a list envelope, search_certificates returns a paginated envelope). No model is called, so this runs anywhere.

  • Paid / model-driven (npm run test:llm:live, dir tests/llm-live/): drives a small Claude model (default Sonnet, override with STREAM_LLM_LIVE_MODEL) via the Claude Agent SDK against the spawned MCP, and asserts the model both selects the right tool and surfaces usable output in its answer. This costs money and is opt-in — it only runs when explicitly enabled:

    source .env.local && STREAM_LLM_LIVE=1 bun run test:llm:live

    It skips (no model call, no billing) unless STREAM_LLM_LIVE=1, a claude binary is on PATH, ANTHROPIC_API_KEY is unset, and STREAM_E2E_* are present. The config deliberately does not auto-load .env.local.

When to run the paid tier: before merging changes that alter tool names, descriptions, or input schemas. The deterministic ranker in test:scenarios is a fast proxy, but only a real model exposes whether your wording works for an actual user. A per-scenario maxBudgetUsd cap and maxTurns cap bound any runaway loop, and the suite is excluded from CI so PR builds never incur charges. Use a cheaper model with STREAM_LLM_LIVE_MODEL=claude-haiku-4-5 when you just want a quick selection check.

CI

.github/workflows/ci.yml:

  • Commitlint (pull requests): enforces Conventional Commits across the PR.
  • Validate (push to main + PRs): format:checklinttypecheckbuildverify:truthtesttest:scenarios, then test:e2e:smoke when the STREAM_E2E_* secrets are configured. Everything runs on Bun with a frozen lockfile; the paid LLM tier is never run in CI.

.github/workflows/release.yml runs after a successful CI run on main: semantic-release derives the next version from Conventional Commits, updates the changelog, publishes to npm (with provenance), and creates a GitHub release with the cross-compiled standalone binaries from build:binaries.

Conventions

  • snake_case tool inputs ↔ camelCase API payloads.
  • Object names/identifiers are immutable primary keys — never invent them.
  • Secrets are write-only and redacted from all tool output.
  • Keep files focused (one responsibility); split large domains by sub-resource.
  • Commit messages follow Conventional Commits — enforced locally by commitlint via a husky commit-msg hook, and consumed by semantic-release for versioning.