Nanda Town

You have an agent protocol. Nanda Town tells you whether it actually works.

You wrote a payments scheme, an identity scheme, a coordination scheme, a trust scheme — something a fleet of agents has to agree on. Nanda Town is the test rig: it spins up a swarm, plugs your protocol into a 12-layer agent stack, runs them through a scenario (marketplace, auction, voting, consensus, supply chain, reputation), and gives you a trace you can grep, diff, and validate against properties you care about.

It is a testing tool first, a simulator second.

pip install "nest-core[plugins]"
nest run marketplace

That's the whole "hello world". No clone, no path, no setup.

Before you push (contributors only): run make ci-local. It runs the exact CI command sequence — uv sync, ruff check, ruff format --check, pyright, pytest -v — and hard-fails on the first red command. See the Definition of Done in CONTRIBUTING.md.

Hackathon

Nanda Town is hosting a month-long open hackathon. Pick one undersolved problem from the 12-layer stack, ship a plugin or scenario or validator that fixes it, prove it works under adversarial conditions. Every submission is scored by an automated judge panel along six dimensions (correctness, test rigor, API fit, docs, novelty, persona fidelity), each on a 1-5 scale.

Charter — the participant brief: what to build, the rules, branch naming.
Problems — the 10 open problems, each with motivation, success criteria, anti-patterns, and the reference files you'll touch.
Judging — the six-dimension rubric and how the scoreboard works.

If this is your first time here, read the charter first. Scoreboard details and judge-panel internals are in the Scoreboard section below.

Install

pip install "nest-core[plugins]"

That brings in the reference implementations for all 12 layers, the CLI, and the seven built-in scenarios. Optionally:

pip install "nest-core[llm]"        # adds nest-shell for Tier 2 (LLM) agents
pip install "nest-core[full]"       # plugins + llm

Verify:

nest doctor

You should see 7/7 checks passed. If you don't, the message tells you what's missing.

The 60-second tour

You don't need to clone the repo. The seven reference scenarios ship inside the wheel.

# What's in the box
nest scenarios list

# Run one
nest run marketplace

# Look at the trace
nest inspect ./traces/marketplace.jsonl

# Get a metrics report (HTML)
nest report ./traces/marketplace.jsonl -o report.html

# Open the interactive dashboard
nest dashboard ./traces/marketplace.jsonl

Every nest run <name> writes a JSONL trace to ./traces/<name>.jsonl. Same seed → byte-identical trace, every time.

Read this if nest run scenarios/marketplace.yaml errors out: that command only works inside a clone of this repo — the scenarios/ directory isn't installed by pip. Use nest run marketplace (a built-in name) or copy a built-in out to edit: nest scenarios cp marketplace .

Test your own protocol

This is the loop Nanda Town is for. You have a protocol; you want to know if it survives 50 agents, a 5% message-drop rate, and a few Byzantine peers.

1. Pick the layer your protocol slots into

transport · comms · identity · registry · auth · trust ·
payments · coordination · negotiation · memory · privacy · datafacts

Say you're testing a new payments scheme.

2. Implement the layer interface

# my_payments/plugin.py
from nest_sdk import (
    Payments, AgentId, Money, PaymentRef, Receipt, Quote,
    ServiceRef, PaymentStatus,
)

class MyPaymentProtocol(Payments):
    async def quote(self, service: ServiceRef) -> Quote: ...
    async def pay(self, to: AgentId, amount: Money, ref: PaymentRef) -> Receipt: ...
    async def verify_payment(self, ref: PaymentRef) -> PaymentStatus: ...
    async def refund(self, ref: PaymentRef) -> None: ...

3. Register it as a plugin

# pyproject.toml
[project.entry-points."nest.plugins.payments"]
my_scheme = "my_payments.plugin:MyPaymentProtocol"

Install it: pip install -e .

4. Point a scenario at it

# Get a scenario you can edit
nest scenarios cp marketplace .

# In marketplace.yaml, change one line:
#   payments: prepaid_credits
# to:
#   payments: my_scheme

5. Run it and compare

nest run marketplace.yaml -o ./traces/with-prepaid.jsonl    # baseline
# edit marketplace.yaml to flip back to your scheme
nest run marketplace.yaml -o ./traces/with-mine.jsonl

nest report ./traces/with-prepaid.jsonl -o ./report-baseline.html
nest report ./traces/with-mine.jsonl    -o ./report-mine.html

# Or assert properties programmatically:
python -c "
from pathlib import Path
from nest_core.validators import validate_trace
for r in validate_trace(Path('traces/with-mine.jsonl'), 'marketplace'):
    print(('PASS' if r.passed else 'FAIL'), r.name, r.detail)
"

That's the whole loop. Pick a layer, plug in your implementation, point a scenario at it, watch what changes in the trace, run validators against the trace. Repeat with failures.message_drop: 0.05, with 10× more agents, with a Byzantine fraction — the same flow.

You can do this for any of the 12 layers. Trust, coordination, identity, auth — all of them follow the same recipe.

Built-in scenarios

Each one stresses a different part of the stack. Open them with nest scenarios show <name> or copy them out with nest scenarios cp <name> ..

Scenario	Agents	What it stresses
`marketplace`	50 buyers, 50 sellers	Payments · trust · registry · negotiation under bilateral price discovery.
`auction`	1 auctioneer + 19 bidders	Coordination + multi-round messaging; "the highest bid wins" is checked as a property.
`voting`	1 proposer + 1 coordinator + 18 voters	Simple majority. Properties: every vote counted, no double-voting, tally matches.
`consensus`	1 leader + 19 followers	Quorum agreement (default 2/3). Not a full BFT — useful for testing quorum-shaped protocols.
`supply_chain`	4 (supplier → manufacturer → distributor → retailer)	Multi-hop message reliability under drop / partition.
`reputation`	16 honest + 4 malicious + 1 observer	Trust layer under adversarial agents that cheat probabilistically.
`shell_marketplace`	3 buyers + 3 sellers (LLM-backed)	Same as marketplace but agents are LLM-driven via `nest-shell`. Non-deterministic (Tier 2).

Failure injection is per-scenario, edit the YAML:

failures:
  message_drop: 0.05               # 5% of messages dropped
  byzantine_agents: 0.10           # 10% of agents send garbled messages
  network_partition:
    groups: [["agent-0","agent-1"], ["agent-2","agent-3"]]

The 12 layers

Every layer is a Python Protocol (structural typing). Plugins are resolved by name via entry points or a built-in default.

#	Layer	Interface	Default plugin
1	Transport	`Transport`	`in_memory` (in-process event queue; no network I/O)
2	Communication	`CommsProtocol`	`nest_native` (JSON envelope, base64 payload)
3	Identity	`Identity`	`did_key` (deterministic public-key signatures for simulation; not Ed25519)
4	Registry	`Registry`	`in_memory` (dict lookup, no persistence)
5	Auth	`Auth`	`jwt` (HMAC-SHA256 token; not RFC JWT)
6	Trust	`Trust`	`score_average` (running mean reputation; no Sybil resistance)
7	Payments	`Payments`	`prepaid_credits` (in-memory ledger)
8	Coordination	`Coordination`	`contract_net` (FIPA: propose · bid · resolve · commit)
9	Negotiation	`Negotiation`	`alternating_offers` (Rubinstein, with patience discount)
10	Memory	`Memory`	`blackboard` (shared KV, subscribe, CAS)
11	Privacy	`Privacy`	`noop` (stub passthrough)
12	Data Facts	`DataFacts`	`datafacts_v1` (dataset publish · fetch · ACL)

All defaults are reference implementations for testing, not production-ready. That is the point: you replace the layer you care about with your real implementation and use the rest as a host.

Validators

nest_core.validators ships property checks for each scenario. They read a JSONL trace and verify protocol-level invariants — not just message counts.

from pathlib import Path
from nest_core.validators import validate_trace

for r in validate_trace(Path("traces/auction.jsonl"), "auction"):
    print(f"{'PASS' if r.passed else 'FAIL'}: {r.name} — {r.detail}")

Scenario	Validator checks
Auction	Winner has the highest bid; exactly one winner per item; every bidder is notified.
Voting	Announced tally matches the count; every vote counted; no double-voting.
Consensus	Committed rounds have ≥ 2/3 accepts; only proposed values committed; ≤ 1 commit per round.
Supply chain	Delivered goods trace through all four hops; no materials lost.
Reputation	Cheaters get bad reports; agents with score ≤ −3 are warned.
Marketplace	No double-sell (same product to two buyers); every buy answered; sold price matches the offer.

Important. Validators are property checks, not blessings of the bundled scenarios. They inspect trace evidence and can still only verify properties encoded in the trace. Treat a passing validator as a regression signal, not a proof of protocol correctness.

Fidelity tiers

Tier	Agent	Clock	Scale	Deterministic	Use case
1	State-machine (`StateMachineAgent`)	Logical event ordering — see below	10,000+	Yes	Protocol correctness, parameter sweeps, regression.
2 (experimental)	LLM-backed (`ShellAgent`) — OpenAI / Anthropic / mock	Same as Tier 1	10–100	No	Exploring emergent behavior of LM agents. Not for benchmarks.

Tier 2 needs pip install "nest-core[llm]" and an API key in OPENAI_API_KEY or ANTHROPIC_API_KEY. Use agents.brain: shell and agents.llm_provider: anthropic|openai|mock in the scenario YAML.

Determinism & what the clock does

Two things to know that aren't obvious:

Same seed → identical trace. The master RNG is seeded once, derives a per-agent RNG and a separate failure-injection RNG, and replays exactly under the same seed. You can use that for regression benchmarks.
The default transport is zero-latency. The bundled in_memory transport delivers at time = now, so the virtual clock stays at 0.0 and mean_latency / duration will both be 0.0 in your trace. The event queue still orders events deterministically, so correctness tests are fine. Latency numbers become meaningful only when agents use ctx.schedule(delay, ...) or you write a transport plugin that introduces per-hop delay.

Limitations

Reference plugins are deliberately simplified. They're testing scaffolding, not production code. Deterministic simulation signatures, no-op privacy, in-memory ledger, etc.
Reference scenarios are minimal baselines. They check whether agents interact, not whether a real implementation of the protocol is correct. That's what your plugin is for.
Single process. No distributed execution.
In-memory transport only out of the box. No TCP, gRPC, or HTTP yet.
Tier 2 is non-deterministic. Don't use it for benchmarks.

Scoreboard

Live scoreboard: docs/hackathon/scores.json — machine-readable scores for every open hackathon PR. A marketplace UI on top of this file is coming.

The judge panel lives in scripts/judge/:

scripts/judge/rubric.md — the rubric prompt (versioned).
scripts/judge/judge_pr.py — score one PR with N parallel judges via the Anthropic API, with prompt caching on the rubric.
scripts/judge/run_all.py — CLI that scores every open hackathon/* PR and writes the scoreboard JSON. Idempotent on HEAD SHA.

Re-run the full scoreboard with three live Opus judges per PR:

export ANTHROPIC_API_KEY=...
uv run python -m scripts.judge.run_all --output docs/hackathon/scores.json

Without an API key the CLI falls back to deterministic mock judges so the schema is exercised even in CI. See the in-repo PR description for the full cost model and how to reproduce.

Contributing

See CONTRIBUTING.md for development setup, coding standards, and how to add scenarios or layer plugins.

Issues and pull requests are welcome at github.com/projnanda/nandatown.

Citation

@software{nest2026,
  title  = {Nanda Town},
  author = {MIT Media Lab},
  year   = {2026},
  url    = {https://github.com/projnanda/nandatown},
  license = {Apache-2.0}
}

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
apps		apps
data/hackathon-runs		data/hackathon-runs
docs		docs
examples		examples
packages		packages
scenarios		scenarios
scripts		scripts
templates/agents		templates/agents
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.railwayignore		.railwayignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanda Town

Table of Contents

Hackathon

Install

The 60-second tour

Test your own protocol

1. Pick the layer your protocol slots into

2. Implement the layer interface

3. Register it as a plugin

4. Point a scenario at it

5. Run it and compare

Built-in scenarios

The 12 layers

Validators

Fidelity tiers

Determinism & what the clock does

Limitations

Scoreboard

Contributing

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nanda Town

Table of Contents

Hackathon

Install

The 60-second tour

Test your own protocol

1. Pick the layer your protocol slots into

2. Implement the layer interface

3. Register it as a plugin

4. Point a scenario at it

5. Run it and compare

Built-in scenarios

The 12 layers

Validators

Fidelity tiers

Determinism & what the clock does

Limitations

Scoreboard

Contributing

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages