Agentic coding tools for the Vercel AI SDK.
Tech Stack: TypeScript · Bun · Vercel AI SDK · Zod
Package: bashkit (npm · GitHub)
This file is for agents and humans working ON bashkit. For consumer-facing API usage (how to use bashkit in an app), see README.md. For folder-specific internals, see the AGENTS.md inside each src/* directory.
Before editing anything inside
src/<folder>/, readsrc/<folder>/AGENTS.mdfirst. Every folder has one. They document internal file layout, key exports, data flows, and per-task modification steps. This root file intentionally does not duplicate them — if you only read this file, you are missing half the picture.
These apply to every PR, no exceptions:
- Fully typed. No
any. Useunknownat untrusted boundaries and narrow with guards. Public APIs must have explicit return types — don't rely on inference for exports. Tool input/output shapes live in Zod schemas + exported TypeScript interfaces that stay in sync. - Testable and tested. Every public export has a test. Tests mirror
src/layout intests/. Bug fixes include a regression test. If a change is hard to test, refactor until it isn't. - Typecheck and lint before pushing.
bun run typecheck && bun run check && bun run testmust be green locally. CI will reject otherwise. - Return errors, don't throw. Tools return
{ error: string }objects so the model can see the failure. Only sandbox-layer code throws, and tools catch it. - Config-driven, not flag-driven. Optional features are enabled by the presence of a config object (e.g.
webSearch: { apiKey }), not by boolean flags. Defaults live in factories viaconfig?.field ?? default. - No breaking changes without a major bump. See the Breaking Change Surface section below before touching the
Sandboxinterface, tool schemas, tool names,ContextLayer, orcreateAgentToolsreturn shape. - Docs live next to code. When you change files in a folder, update that folder's
AGENTS.mdin the same PR.
If a references/ directory exists at the project root, search it for implementation patterns when building new features. It is gitignored — contributors symlink or clone repos locally.
references/codex— OpenAI Codex CLI. Tool designs, agent loop, sandboxing patterns.references/pi-mono— pi-mono monorepo. Seepackages/coding-agentfor agent loop patterns.
src/
├── sandbox/ # Execution environments (Local, Vercel, E2B) — src/sandbox/AGENTS.md
├── tools/ # Tool implementations — src/tools/AGENTS.md
├── context/ # Prompt assembly + tool execution layers — src/context/AGENTS.md
├── cache/ # Tool result caching (LRU, Redis) — src/cache/AGENTS.md
├── middleware/ # AI SDK language model middleware — src/middleware/AGENTS.md
├── utils/ # Budget, compaction, context status, helpers — src/utils/AGENTS.md
├── skills/ # Agent Skills standard — src/skills/AGENTS.md
├── setup/ # Agent environment setup (sandbox bootstrapping) — src/setup/AGENTS.md
├── cli/ # CLI initialization — src/cli/AGENTS.md
├── types.ts # AgentConfig, ToolConfig, DEFAULT_CONFIG
└── index.ts # Barrel re-exports (public API surface)
Each folder has its own AGENTS.md with file listings, exports, internal architecture, and per-task modification guides.
- Every folder under
src/must have anAGENTS.md. When you add a new folder, add one. - Every
AGENTS.md(except the root) must have a co-locatedCLAUDE.mdsymlink pointing to it. - Automation:
bun run link-agentscreates missing symlinks;bun run check:agentsfails CI if any are missing. - When you add, remove, or significantly change files in a folder, update that folder's
AGENTS.mdin the same PR. Stale folder docs are worse than no docs.
bun install
bun run typecheck # ALWAYS run before bun run build
bun run build # Bun bundles to dist/index.js + tsc emits .d.tsScript names are exact — no hyphens. It's typecheck, not type-check. Running the wrong name will just error with "Script not found". See package.json for the full list.
Critical: bun run build does not fail on type errors during bundling. Run bun run typecheck first or type regressions will ship silently.
Before pushing, run all four gates locally — CI will reject otherwise:
bun run typecheck && bun run check && bun run test && bun run check:agentsExact script names (from package.json): typecheck, build, test, test:watch, test:coverage, format, format:check, lint, lint:check, check, check:ci, link-agents, check:agents.
Use Vitest via bun run test — not bun test (which runs Bun's built-in runner and will miss our suite).
bun run test # all tests
bun run test tests/utils/budget.test.ts # single file
bun run test:watch # watch mode
bun run test:coverage # with coverageTests live in tests/<folder>/ mirroring src/<folder>/. Examples in /examples/ serve as integration tests and require sandbox/API-key env vars.
Everything non-trivial ships with tests. New tools, new context layers, new utilities, new sandbox methods — all get unit tests before merging. Bug fixes include a regression test that would have caught the bug. If you can't easily test something, that's a signal the abstraction is wrong, not a reason to skip the test.
Biome handles both:
bun run check # lint + format, auto-fix
bun run check:ci # lint + format, no writes (CI gate)
bun run format # format only
bun run lint # lint onlyRun bun run check before pushing. CI runs check:ci, typecheck, test, and check:agents — all four must pass.
- Commits are small, imperative, sentence-case:
Add budget tracking,Refactor AskUser tool to deferred client-rendered model,Fix lint and typecheck CI failures. - One logical change per commit. Keep refactors separate from feature work.
- PR titles follow the same style as commits. PR descriptions should explain why, link relevant issues, and call out any public API changes.
- CI gates:
typecheck,check:ci(Biome),test,check:agents. All four must pass before merge.
Use LocalSandbox (Bun APIs, no network) for fast iteration. Swap to VercelSandbox / E2BSandbox when you need to verify production behavior.
bun examples/test-tools.ts # direct tool calls, no AI
ANTHROPIC_API_KEY=xxx bun examples/basic.ts # full agentic loop| Element | Convention | Examples |
|---|---|---|
| Tool names | PascalCase | Bash, Read, WebSearch |
| Factories | createX |
createBashTool, createLocalSandbox |
| Output types | XOutput |
BashOutput, ReadOutput |
| Error types | XError |
BashError, ReadError |
| Config types | XConfig |
ToolConfig, AgentConfig |
| Files | kebab-case | bash.ts, anthropic-cache.ts |
- Input schemas: colocated with tool implementation (
src/tools/bash.tsdefinesbashInputSchema). - Output/Error types: exported from the tool file; tools return
Output | Errorunions. - Config types: centralized in
src/types.ts. - Error handling: tools return
{ error: string }objects — they do not throw. Sandbox methods may throw; tools catch them.
All optional tool parameters use z.nullable(), not z.optional(). OpenAI structured outputs require every property in the required array; .optional() removes them and breaks OpenAI. .nullable() keeps them required but allows null, and works on both Anthropic and OpenAI.
const schema = z.object({
timeout: z.number().nullable(),
replace_all: z.boolean().nullable(),
});
// Destructuring defaults (= value) only fire on undefined, NOT null.
// Always use ?? for defaults with nullable fields:
const { timeout, replace_all: rawReplaceAll } = input;
const effectiveTimeout = timeout ?? 120000;
const replaceAll = rawReplaceAll ?? false;Tool factories accept an optional ToolConfig and merge with defaults inline:
export function createBashTool(sandbox: Sandbox, config?: ToolConfig) {
const timeout = config?.timeout ?? 120000;
// ...
}Optional features (WebSearch, WebFetch, cache, budget, context layers) are enabled by config presence in createAgentTools — don't gate them on feature flags.
All tools depend on Sandbox from src/sandbox/interface.ts, not concrete implementations. Adding a method is a breaking change for every implementer.
interface Sandbox {
exec(command: string, options?: ExecOptions): Promise<ExecResult>;
readFile(path: string): Promise<string>;
writeFile(path: string, content: string): Promise<void>;
readDir(path: string): Promise<string[]>;
fileExists(path: string): Promise<boolean>;
isDirectory(path: string): Promise<boolean>;
destroy(): Promise<void>;
readonly id?: string; // for cloud reconnection
rgPath?: string; // set by ensureSandboxTools
}createVercelSandbox() and createE2BSandbox() are async and auto-run ensureSandboxTools to install ripgrep so Grep works immediately. createLocalSandbox() is sync.
src/context/ provides two separate concerns:
- Static system prompt assembly (
buildSystemContext) — discoversAGENTS.md/CLAUDE.mdfiles, collects environment info (cwd, git branch, platform), builds tool guidance. Called once at init, must stay stable across turns for Anthropic prompt caching. - Dynamic per-step layers (
withContext,applyContextLayers,createExecutionPolicy,createOutputPolicy) — intercept every tool call (beforeExecutegate,afterExecutetransform).createPrepareStepcomposes compaction + context-status + plan-mode hints into an AI SDKprepareStepcallback.
Never mutate system from prepareStep — it will break prompt caching. Dynamic hints go in messages as user content.
createAgentTools(sandbox, config) is the single entry point that wires tools + cache + budget + context layers from a config object. Everything else is either internal or a lower-level primitive.
User code → Vercel AI SDK → Tool (wrapped w/ context layers + cache)
↓
Sandbox interface
↓
┌────────────────┼────────────────┐
↓ ↓ ↓
LocalSandbox VercelSandbox E2BSandbox
↓ ↓ ↓
Bun APIs Firecracker VM E2B service
Required peer deps: ai ^5.0.0, zod ^4.1.8.
Optional peer deps — users pick their execution environment:
@vercel/sandbox^1.0.0 — Vercel Firecracker isolation@e2b/code-interpreter^1.0.0 — E2B hosted executionparallel-web^1.0.0 — WebSearch / WebFetch backend
All deps are marked external at build time so consumers don't get a duplicated ai/zod bundle.
Anything in this list requires a major version bump:
Sandboxinterface (src/sandbox/interface.ts) — adding methods breaks every implementer.- Tool input schemas — AI models see these in prompts; removing or renaming fields breaks live integrations.
- Tool output/error shapes — consumers pattern-match on them.
- Tool names — they appear verbatim in prompts ("use the Bash tool").
ContextLayersignature (src/context/index.ts) — changes ripple through every custom layer downstream.SystemContextshape (src/context/build-context.ts) — consumers read individual sections.createAgentToolsreturn shape —AgentToolsResultis a public contract.
Safe in minor/patch:
- Adding new optional config fields
- Adding new tools or sandbox implementations
- Internal refactors that preserve public API
- Bug fixes
The Bash tool executes arbitrary commands inside the sandbox — that's the whole point, but it means production deployments must:
- Run inside a real sandbox (Vercel or E2B), not LocalSandbox.
- Set
blockedCommands+timeoutonBash. - Set
allowedPathsonRead/Write/Edit. - Set
maxFileSizeonWrite. - Never expose the raw agent loop to untrusted users without an additional auth layer.
See src/tools/AGENTS.md for per-tool config details.
| Task | Where to start |
|---|---|
| Add a new tool | src/tools/AGENTS.md → "Common Modifications" |
| Add a new sandbox | src/sandbox/AGENTS.md → "Common Modifications" |
| Add middleware | src/middleware/AGENTS.md → "Common Modifications" |
| Add a cache backend | src/cache/AGENTS.md → "Common Modifications" |
| Add a context layer or prompt section | src/context/AGENTS.md → "Common Modifications" |
| Add a skill source | src/skills/AGENTS.md → "Common Modifications" |
| Add a config field | Define in src/types.ts, consume in the relevant factory via config?.yourField ?? default |