A discipline for AI-generated codebases. Imposes structure that holds as projects grow, with verifier-checked contracts and a re-raise loop that surfaces under-specification before code is written.
This is the skills suite — the prompt-and-subagent layer. For the runtime backbone that makes the skills cheaper to operate (state management, atomic writes, schema validation), see Kupu (in development).
LLMs given an open-ended brief tend to produce sprawling, inconsistent, hard-to-review code that drifts further from the brief as it grows. Taniwha is a set of skills and subagents that constrain how the agent works: it must agree on a structural shape before writing any code, contracts must be complete enough to implement in isolation, every implementation gets independently verified, and ambiguity gets surfaced as questions rather than silently filled in.
The cost is more ceremony than a human writes by hand at small scale. The payoff is structure that holds as the codebase grows — and an audit trail of decisions, contracts, and verifications that survives the conversations that produced them.
Taniwha matches structure to the brief. A small project gets a single contract, a single implementation, a single verifier. A large project gets a tree of modules with composition agents wiring them together. The design-doc agent decides which tier the brief calls for, and the user approves; the rest of the system follows.
Taniwha Skills is one piece of a wider toolset for AI-centric codebases. The thesis: AI design should be a primitive of the codebase, not an afterthought. Repositories should carry their own structural discipline, decision history, and contract definitions in a form that survives any individual conversation, agent, or developer. Source code is one artefact among several; design docs, contracts, vocabulary, and decision records are first-class citizens of the repo, not chat ephemera.
The Taniwha suite currently spans:
- Taniwha Skills (this repo) — the discipline and prompts that constrain AI build behaviour
- Kupu (in development) — the runtime state backbone for AI-centric repos; manages contracts, decisions, events, re-raises in
.taniwha/kupu/ - Arai — local guardrails that enforce coding rules during AI-driven development
Eventually all of these feed into Kete, Taniwha's broader platform for managing AI work across teams. None of that is required to use the skills today; this repo stands alone.
/plugin install taniwha
Drop the .claude/ directory at the root of your project, or:
git clone <repo-url> taniwha
cp -r taniwha/skills .claude/skills
cp -r taniwha/agents .claude/agents
cp taniwha/CLAUDE.md ./CLAUDE.md
Then in Claude Code, ask it to start a Taniwha build:
Build me a [whatever you want] as a Taniwha project.
The dispatcher takes over from there.
The skills run on bash utility scripts and direct file writes by default. For a leaner agent context and atomic state operations, install Kupu — the MCP server that backs Taniwha state. The skills detect Kupu's tools at runtime per-operation; whichever tools are available are used, and the rest fall through to the bash path.
Both modes produce the same .taniwha/kupu/ layout, so you can install Kupu later (or upgrade across Kupu phases) without re-running existing builds.
Kupu ships in phases. Each phase adds tools without breaking previously-shipped surfaces. Skills work with any Kupu phase installed, falling back to bash for tools not yet shipped:
- Phase 1 — primitives (
new_id,now) and project lifecycle (init,get_project) - Phase 2 — durable writes (
record_event,record_decision,register_re_raise,resolve_re_raise) - Phase 3+ — read tools, artefact CRUD, tree operations, toolchain detection, build metrics (planned)
A Phase 1-only Kupu installation produces a build trace that uses Kupu for IDs and timestamps but bash for state writes. A Phase 2 installation uses Kupu for both. The skills don't need configuration to handle this — detection is per-operation.
For the full Kupu tool surface and roadmap, see the kupu repository's docs/kupu-tool-surface.md.
A Taniwha build is a loop:
- Dispatcher (the main Claude Code session) holds the Task tool and acts as a mechanical executor.
- Orchestrator subagents are spawned one at a time, read project state from disk, decide the single next action, write that decision to disk, and exit. They have no memory between invocations.
- Role subagents (design-doc, contract-derivation, leaf-implementation, composition, verifier) carry out the orchestrator's decisions. Each role sees a strictly whitelisted set of inputs to enforce compartmentalisation.
The project's memory lives in a .taniwha/ directory at the project root: design docs, contracts, vocabulary, implementation manifests, decision records, event log, re-raise log. Source code lives at the repo root in whatever layout the project context names; .taniwha/ only holds agent state.
A single user-facing run looks like this:
- User gives a brief.
- Dispatcher captures the brief, spawns the orchestrator.
- Orchestrator surfaces structured questions to capture project context (language, repo layout, test framework).
- Toolchain detection runs.
- Design-doc subagent produces a design including the structural tier (single_module / small_multi_module / full_decomposition). Any under-specified parts of the brief surface as open questions.
- User answers open questions. User approves the design and tier.
- Contract-derivation subagent produces per-module manifests and shared vocabulary.
- Leaf-implementation subagent(s) build modules from contracts.
- Verifier subagent(s) independently check each implementation against its acceptance criteria.
- Composition subagent(s) wire modules together (multi-module builds only).
- Build complete; user reviews summary.
Every step is a fresh subagent reading filesystem state. Decisions, events, and re-raises are durable and human-readable.
| Skill | Purpose |
|---|---|
| design-doc | Produces a structural design from a brief, including the project's tier. Audits the brief for under-specification; surfaces silent decisions. |
| contract-derivation | Derives per-module manifests and shared vocabulary from an approved design. Manifests are language-neutral. |
| leaf-implementation | Implements one module from one manifest, working only from the manifest + vocabulary + project context. |
| composition | Wires two completed children under a parent contract. Produces canonical shared-types packages. |
| verifier | Independently verifies an implementation against its contract's acceptance criteria. Writes its own tests. |
| orchestrator | Ephemeral; decides the single next action by reading state. |
| dispatcher | Main-session loop; spawns subagents per orchestrator decisions. |
Defined in agents/; installed alongside the skills.
| Agent | Skill |
|---|---|
taniwha-orchestrator |
orchestrator |
taniwha-design-doc |
design-doc |
taniwha-contract-derivation |
contract-derivation |
taniwha-leaf-implementation |
leaf-implementation |
taniwha-composition |
composition |
taniwha-verifier |
verifier |
These are the load-bearing rules across the system:
- Compartmentalisation. Every role sees only a whitelisted set of inputs. Cross-role context leakage destroys the discipline.
- Re-raise over guess. Any under-specified clause is surfaced as a question, not silently filled in.
- Tier match. The structural shape matches the brief. A URL shortener gets one module; a multi-subsystem service gets a tree.
- Verification is mandatory. Implementor self-reports do not count. A verifier reads the contract independently and checks per-AC.
- State on disk, not in context. Every decision survives in
.taniwha/. Subagents read state, decide one thing, exit. - Cold-readable. A returning agent six months later, with no chat history, can pick up where it left off by reading the directory.
Taniwha focuses on project-architecture. For general-purpose engineering and productivity skills (TDD, debugging, grilling sessions, codebase improvement), mattpocock/skills is excellent and pairs well.
MIT — see LICENSE.