git clone https://github.com/evertrust/stream-mcp.git
cd stream-mcp
bun installNode.js ≥ 22.19 (or Bun 1.x+). TypeScript, ESM.
| Command | What it does |
|---|---|
bun run dev |
Run the server from source via tsx |
bun run build |
Bundle to dist/index.js (tsup; inlines knowledge .md) |
bun run build:binary |
Compile a single native standalone binary (dist/stream-mcp) |
bun run build:binaries |
Cross-compile standalone binaries for Linux/macOS/Windows (x64 + arm64) |
bun run typecheck |
tsc --noEmit |
bun run lint / bun run lint:fix |
ESLint |
bun run format / bun run format:check |
Prettier |
bun run verify:truth |
Verify every MCP /api/v1 route reference exists in the Stream backend routes (see below) |
bun run validate:ci |
Run the full local gate: format → lint → typecheck → build → verify:truth → test → scenarios |
bun run test |
Unit tests (vitest) |
bun run test:e2e |
Live e2e tests against a real instance |
bun run test:e2e:smoke |
A small live e2e smoke subset (CI gate when STREAM_E2E_* is set) |
bun run test:scenarios (test:llm) |
Free, deterministic LLM-evaluation tier: in-process tool/resource metadata, a keyword tool-ranker, and (when STREAM_E2E_* is set) grounded checks that the tools return usable output — no model called |
bun run test:llm:live |
Paid, opt-in model-driven smoke: a small Claude model (default Sonnet) drives the MCP and must select the right tool and surface usable output |
src/
index.ts # server entry (stdio); server instructions
settings.ts # STREAM_* env -> validated config (zod)
logging.ts # JSON logger + MCP logging sink
auth/ # AuthProvider base, local-account + mTLS providers, factory
client/ # StreamClient (undici), errors (StreamError), retry
tools/
register.ts # registerTool wrapper (verb -> annotations + isError)
helpers.ts # pagination/search/list/mutate envelopes, deleteGuard
_scaffold.ts # ConfigSpec CRUD generator (GET-strip-merge-PUT)
registry.ts # registerAllTools — single wiring point
<domain>/ # one folder per domain (x509-ca, crypto, ssh, ...)
docs/ # search_docs + get_doc
resources/
index.ts # registers stream://knowledge/* resources
catalog.ts # resource catalog (imports the .md)
knowledge/*.md # embedded knowledge (bundled at build time)
generated/docs/ # committed verify:truth snapshot (stream-routes.json, mcp-api-paths.json)
scripts/
verify-truth.ts # route-truth verifier entry point
lib/truth.ts # Play-routes parser + MCP /api/v1 reference scanner
docs/ # this documentation
tests/
unit/ # mocked-client unit tests
e2e/ # live tests (gated by STREAM_E2E_*)
- settings → auth → client → tools.
settings.tsparsesSTREAM_*into a validated config;auth/builds a provider (local headers or mTLS) consumed by the singleStreamClient; domain tools call the client and register through oneregisterToolwrapper. - One client.
StreamClient(undici) handles lazy init (whoami + license), bounded JSON, structuredStreamErrorparsing, retry on safe verbs, and maps Stream's HTTP204(empty / forbidden) to[]on list endpoints. - Name-keyed CRUD is generated by
_scaffold.tsfrom aConfigSpec. Stream updates are full-replace PUTs keyed by the body's id field, so updates do GET → strip server/asymmetric fields → merge → PUT. - Annotations are derived from the tool's verb prefix (read-only / additive /
idempotent / destructive); thrown
StreamErrors becomeisErrorresults so the model can self-correct.
- Add it to the relevant
src/tools/<domain>/file (use_scaffold.tsfor standard name-keyed CRUD, orregisterToolfor custom operations). - Map snake_case inputs to the exact camelCase wire fields. For updates, ensure
stripFieldslists every server-managed and rich-on-read field (e.g.certificate,publicKey). - Add a unit test under
tests/unit/<domain>.test.tswith a mocked client. - If it's a new domain, wire its
registerXxxToolsintosrc/tools/registry.ts. bun run validate:ci(or at minimumbun run typecheck && bun run lint && bun run test && bun run build). If the tool reaches a new backend route, also runbun run verify:truth --writeagainst a../streamcheckout and commit the regenerated snapshot.
Mocked-client tests assert each tool registers, builds the correct route/method/camelCase payload, enforces the delete echo guard, and shapes search queries correctly. A server integration test boots an in-memory MCP session and verifies the full tool/resource surface.
bun run verify:truth statically checks that every /api/v1 path the MCP
references resolves to a real route in the Stream backend. It parses the Play
conf/routes of the backend checkout (following nested -> sub-router
includes), scans src/ for path references (route consts, routeCollection/
routeItem spec fields, and client.* calls), and fails on any reference with
no matching backend route — or a method the route does not expose.
It resolves its source of truth in this order: STREAM_SOURCE_ROOT, a sibling
../stream checkout, then the committed snapshot
src/generated/docs/stream-routes.json. CI has no backend checkout, so it runs
against the committed snapshot — regenerate it with bun run verify:truth --write
(requires ../stream) and commit the result whenever backend routes change.
Paths that are genuinely valid but not statically resolvable (e.g. prefix-only
constants) go in ALLOWED_UNVERIFIED_PATHS in scripts/lib/truth.ts.
Gated by STREAM_E2E_* (loaded from a git-ignored .env.local):
STREAM_E2E_URL=https://stream.qa.example.com
STREAM_E2E_API_ID=...
STREAM_E2E_API_KEY=...npm run test:e2eE2e tests skip automatically when the STREAM_E2E_* variables are absent.
Two extra tiers verify the MCP is usable by an LLM (mirroring how the tools are actually consumed):
-
Free / deterministic (
npm run test:scenarios, dirtests/llm-evaluation/): boots the MCP in-process and checks the tool/resource surface, a keyword tool-ranker (a $0 proxy for a small model's tool choice), and — whenSTREAM_E2E_*is set — grounded checks that calling the tools returns usable output (e.g.whoamiresolves a principal,list_casreturns a list envelope,search_certificatesreturns a paginated envelope). No model is called, so this runs anywhere. -
Paid / model-driven (
npm run test:llm:live, dirtests/llm-live/): drives a small Claude model (default Sonnet, override withSTREAM_LLM_LIVE_MODEL) via the Claude Agent SDK against the spawned MCP, and asserts the model both selects the right tool and surfaces usable output in its answer. This costs money and is opt-in — it only runs when explicitly enabled:source .env.local && STREAM_LLM_LIVE=1 bun run test:llm:live
It skips (no model call, no billing) unless
STREAM_LLM_LIVE=1, aclaudebinary is on PATH,ANTHROPIC_API_KEYis unset, andSTREAM_E2E_*are present. The config deliberately does not auto-load.env.local.
When to run the paid tier: before merging changes that alter tool names,
descriptions, or input schemas. The deterministic ranker in test:scenarios is a
fast proxy, but only a real model exposes whether your wording works for an actual
user. A per-scenario maxBudgetUsd cap and maxTurns cap bound any runaway loop,
and the suite is excluded from CI so PR builds never incur charges. Use a cheaper
model with STREAM_LLM_LIVE_MODEL=claude-haiku-4-5 when you just want a quick
selection check.
.github/workflows/ci.yml:
- Commitlint (pull requests): enforces Conventional Commits across the PR.
- Validate (push to
main+ PRs):format:check→lint→typecheck→build→verify:truth→test→test:scenarios, thentest:e2e:smokewhen theSTREAM_E2E_*secrets are configured. Everything runs on Bun with a frozen lockfile; the paid LLM tier is never run in CI.
.github/workflows/release.yml runs after a successful CI run on main:
semantic-release derives the next version from Conventional Commits, updates the
changelog, publishes to npm (with provenance), and creates a GitHub release with
the cross-compiled standalone binaries from build:binaries.
- snake_case tool inputs ↔ camelCase API payloads.
- Object names/identifiers are immutable primary keys — never invent them.
- Secrets are write-only and redacted from all tool output.
- Keep files focused (one responsibility); split large domains by sub-resource.
- Commit messages follow Conventional Commits — enforced locally by commitlint
via a husky
commit-msghook, and consumed by semantic-release for versioning.