A production-grade distributed systems reference built as a real-time tactical arena game — where the idempotency token is the game's core resource.
🚧 Active development — Layer 0 + Iteration 1 complete with E2E coverage. Iteration 2 in progress.
Players join live arena matches, fight to collect resources, then trade in an async economy between rounds. The game's core resource — the idempo Stamp — is the game-layer representation of an idempotency key: spending a Stamp seals an action, guaranteeing exactly-once resolution even under network retries. Under the hood, every interaction exercises a production-grade distributed systems pattern — from idempotent command handling to distributed Saga compensation.
This project exists to demonstrate — concretely and runnably — what top-tier distributed systems engineering looks like.
- Architecture
- Game Loop
- Services
- Patterns Demonstrated
- Saga: Trade Flow
- Idempotency Model
- Observability
- Tech Stack
- Running the Stack
- Project Status
- Documentation
Active development — architecture stable, features expanding per ROADMAP.md.
graph TB
UI["🖥️ Next.js UI<br/>WebSocket + REST"]
GW["🧠 API Gateway<br/>Auth · Rate Limit · Correlation ID"]
subgraph Core Services
GS["🎮 Game Service"]
CS["⚔️ Combat Service"]
RS["🎁 Reward Service"]
WS["💰 Wallet Service"]
IS["📦 Inventory Service"]
MS["🏪 Marketplace Service"]
LS["🏆 Leaderboard Service"]
NS["📢 Notification Service"]
end
subgraph Kafka["📨 Apache Kafka"]
T1["player-actions"]
T2["match-events"]
T3["economy-events"]
T4["leaderboard-events"]
DLQ["*.dlq (Dead Letter)"]
end
subgraph Storage
PG1[("game_db")]
PG2[("wallet_db")]
PG3[("marketplace_db")]
PG4[("inventory_db")]
RD[("Redis<br/>Top 100")]
end
UI -->|WS / REST| GW
GW --> GS
GW --> MS
GW --> LS
GS --> CS
GS --> T1
GS --> T2
CS --> T2
RS --> T3
T2 --> RS
MS <-->|Saga commands| WS
MS <-->|Saga commands| IS
MS --> T3
T3 --> LS
T3 --> NS
T4 --> LS
GS --- PG1
WS --- PG2
MS --- PG3
IS --- PG4
LS --- RD
flowchart LR
A([Player joins]) --> B["Live Arena Match<br/>2–6 players · 3–5 min"]
B --> C{Match ends}
C --> D["MatchFinishedEvent<br/>emitted to Kafka"]
D --> E["Reward Service<br/>grants currency · items · Stamps"]
E --> F["Economy Phase<br/>Sell · Buy · Trade · Craft"]
F --> G["Leaderboard<br/>updated"]
G --> A
| Service | Responsibility | DB | Emits |
|---|---|---|---|
| API Gateway | Auth, rate limiting, correlation ID | — | — |
| Game Service | Match lifecycle, action validation | game_db |
match-events |
| Combat Service | Damage calc, death logic | stateless | match-events |
| Reward Service | Post-match reward grants | — | economy-events |
| Wallet Service | Currency debit/credit with strong consistency | wallet_db |
economy-events |
| Inventory Service | Item ownership and trade locks | inventory_db |
economy-events |
| Marketplace Service | Listings + Saga orchestrator | marketplace_db |
economy-events |
| Leaderboard Service | CQRS read projection | leaderboard_db + Redis |
— |
| Notification Service | Async push / email | stateless | — |
| Pattern | Where |
|---|---|
| Idempotent HTTP commands | API Gateway → Game Service (X-Idempotency-Key) |
| idempo Stamp (game mechanic) | Player spends a Stamp → stampId becomes action_id in player_actions; duplicate requests return original response |
| Idempotent event consumers | All Kafka consumers (processed_events table) |
| Distributed Saga (choreography) | Marketplace trade flow |
| Saga compensation | Trade rollback on any step failure |
| Circuit breaker | Marketplace → Wallet / Inventory (opossum) |
| Retry + exponential backoff + jitter | All inter-service HTTP calls |
| Dead Letter Queue | Failed Kafka messages after 3 retries |
| CQRS | Leaderboard write model vs Redis read projection |
| Optimistic locking | Wallet balance updates |
| Event sourcing (append-only ledger) | Wallet transactions table |
| Partition-based ordering | Kafka keyed by playerId |
sequenceDiagram
actor Buyer
participant MP as Marketplace Service
participant WL as Wallet Service
participant IN as Inventory Service
Buyer->>MP: POST /trade
MP->>MP: INSERT saga_log (INITIATED)
MP->>WL: ReserveFundsCommand
WL-->>MP: FundsReservedEvent
MP->>MP: saga_log → ITEM_LOCKING
MP->>IN: LockItemCommand
IN-->>MP: ItemLockedEvent
MP->>MP: saga_log → FUNDS_TRANSFERRING
MP->>WL: TransferFundsCommand
MP->>IN: TransferItemCommand
WL-->>MP: FundsTransferredEvent
IN-->>MP: ItemTransferredEvent
MP->>MP: saga_log → COMPLETED
MP-->>Buyer: 200 Trade complete
Compensation path (if TransferFundsCommand fails):
flowchart LR
F([Transfer fails]) --> C1["ReleaseFundsCommand<br/>→ Wallet refunds buyer"]
F --> C2["UnlockItemCommand<br/>→ Inventory releases item"]
C1 --> E(["trade = FAILED<br/>Buyer notified"])
C2 --> E
flowchart TD
A["Request arrives<br/>X-Idempotency-Key: uuid"] --> B{"action_id<br/>in DB?"}
B -- Yes --> C[Return cached response<br/>no side effects]
B -- No --> D[Process business logic]
D --> E[INSERT action_id<br/>atomically]
E --> F[Return new response]
Kafka consumers mirror this — every handler checks processed_events before acting, inside the same DB transaction as the business write.
graph LR
SVC["All NestJS Services"] -->|metrics /metrics| PROM["Prometheus"]
SVC -->|traces| OT["OpenTelemetry Collector"]
SVC -->|structured JSON logs| LOKI["Loki"]
PROM --> GRAF["Grafana<br/>Dashboards"]
OT --> JAEGER["Jaeger<br/>Trace UI"]
LOKI --> GRAF
Key metrics exposed per service:
http_request_duration_seconds— latency histogramskafka_consumer_lag— per topic/consumer groupcircuit_breaker_state— open/closed/half-open gaugesaga_duration_seconds— trade completion timedlq_message_count_total— dead letter accumulationretry_count_total— retry pressure
| Layer | Technology |
|---|---|
| Frontend | Next.js 16 (App Router) · socket.io · shadcn/ui · Tailwind CSS v4 · Zustand |
| Backend | NestJS 11 · Apache Kafka · PostgreSQL 17 · Redis 7.4 LTS |
| Resilience | opossum (circuit breaker) · axios-retry · Kafka DLQ |
| Observability | Prometheus · Grafana · Jaeger · Loki · OpenTelemetry SDK · Pino |
| Infrastructure | Docker Compose (local) · Kubernetes · Helm · KEDA · Nx monorepo · pnpm |
Every iteration has a working, runnable version. The two commands below validate any iteration end-to-end:
# 0. One-time setup — copy the env template
cp .env.example .env
# Edit .env: set JWT_SECRET (required). For shared/staging environments, also set KAFKA_CLUSTER_ID.
# 1. Build all app artifacts on the host (Nx handles caching — fast on repeat runs)
pnpm build
# 2. Start all infrastructure + app services
docker compose up -d --build
# 3. Run the E2E suite for a specific iteration (or all)
nx run e2e:e2e # all iterations
nx run e2e:e2e --testFile=iter1.e2e.ts # Iteration 1 only
nx run e2e:e2e --testFile=iter2.e2e.ts # Iteration 2 only
nx run e2e:e2e --testFile=iter3.e2e.ts # Iteration 3 only
nx run e2e:e2e --testFile=iter4.e2e.ts # Iteration 4 onlyUnit + integration coverage is run separately:
pnpm coverage # all services — enforces per-iteration coverage gatesAn iteration is only done when both commands exit green. See ROADMAP.md for the per-iteration Verification scenarios and apps/e2e/ for the E2E test source.
| Deliverable | Status |
|---|---|
| Documentation | |
| PRD, SPEC, API, GAME, RUNBOOK, OBSERVABILITY, DEPLOYMENT | ✅ Active |
| Architecture diagram | ✅ Active |
Build roadmap (ROADMAP.md) |
✅ Active |
ADR: monorepo (docs/adr/001-monorepo.md) |
✅ Complete |
| Layer 0 — Boilerplate | |
| Monorepo scaffold (Nx + pnpm) | ✅ Complete |
Shared packages (@idempo/contracts, kafka, observability, idempotency, circuit-breaker) |
✅ Complete |
Infrastructure (docker-compose.yml, Kafka, PostgreSQL, Redis, Jaeger, Prometheus, Grafana) |
✅ Complete |
| API Gateway (auth, proxy, rate limiting, health checks) | ✅ Complete |
E2E test framework (apps/e2e) |
✅ Complete |
| Iteration 1 — Playable Arena | |
| Game Service (match lifecycle, idempotency, Stamp mechanics) | ✅ Complete |
| Combat Service (damage calc, event-driven) | ✅ Complete |
| Leaderboard Service (CQRS, Redis cache) | ✅ Complete |
| Arena UI (Next.js, WebSocket, live leaderboard) | ✅ Complete |
| E2E tests (iter1.e2e.ts) | ✅ Passing |
| Iteration 2 — Rewards & Inventory | |
| Reward Service | 🔵 In progress |
| Wallet Service | ⬜ Not started |
| Inventory Service | ⬜ Not started |
| Wallet + Inventory UI | ⬜ Not started |
| Iteration 3 — Marketplace & Saga | |
| Marketplace Service (Saga orchestrator) | ⬜ Not started |
| Circuit breaker integration | ⬜ Not started |
| Iteration 4 — Observability & Hardening | |
| Grafana dashboards | ⬜ Not started |
| Alert rules | ⬜ Not started |
| Kubernetes manifests | ⬜ Not started |
| File | Contents |
|---|---|
| docs/PRD.md | Product requirements, user stories, feature scope |
| docs/SPEC.md | System architecture, event contracts, database schemas, saga, resilience patterns |
| docs/GAME.md | Arena mechanics: grid, combat resolution, actions, Stamp-sealed actions, scoring |
| docs/API.md | REST + WebSocket contracts: all endpoints, request/response DTOs, error codes |
| ROADMAP.md | 4-iteration build roadmap with per-iteration deliverables and task checklists |
| docs/RUNBOOK.md | Step-by-step failure injection scenarios demonstrating each distributed systems pattern |
| docs/OBSERVABILITY.md | Metrics catalogue, Grafana dashboards, tracing config, structured log schema, alerting |
| docs/DEPLOYMENT.md | Container strategy, Kubernetes resources, Kafka partitioning, database scaling, quick-start |
| docs/adr/001-monorepo.md | ADR: why monorepo with Nx was chosen over multi-repo |