Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions plumber/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Plumber Stack Agent Notes

Use this file as the high-level triage map for the plumber stack. Each section links to the detailed agent notes that live under `doc/`.

## Quick Map
- `ppl/` – gRPC edge, pipeline/workflow state machines, and RabbitMQ publishers ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)).
- `block/` – block lifecycle orchestrator wired to Zebra task events ([doc/block/AGENTS.md](doc/block/AGENTS.md)).
- `definition_validator/` – YAML parsing + schema/semantic validation before scheduling ([doc/definition_validator/AGENTS.md](doc/definition_validator/AGENTS.md)).
- `job_matrix/` – pure library that expands matrix/parallelism definitions into concrete jobs ([doc/job_matrix/AGENTS.md](doc/job_matrix/AGENTS.md)).
- `gofer_client/` – promotions gRPC client used during deploy flows ([doc/gofer_client/AGENTS.md](doc/gofer_client/AGENTS.md)).
- `looper/` – shared STM/periodic worker macros powering `ppl` and `block` schedulers ([doc/looper/AGENTS.md](doc/looper/AGENTS.md)).
- Support stubs: `repo_proxy_ref/` (mock repo-proxy) and `task_api_referent/` (mock Task API) keep local/dev runs hermetic ([doc/repo_proxy_ref/AGENTS.md](doc/repo_proxy_ref/AGENTS.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).

## End-to-End Flow Scratchpad
1. **Schedule request arrives** β†’ `ppl` gRPC handlers validate YAML via `definition_validator`, expand jobs with `job_matrix`, persist pipeline + block rows, and kick STM workers ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)).
2. **Block execution** β†’ `block` STM loopers provision Zebra tasks and watch RabbitMQ for task completion ([doc/block/AGENTS.md](doc/block/AGENTS.md)).
3. **Task lifecycle** β†’ In tests/local, `task_api_referent` simulates Zebra responses so blocks/pipelines advance predictably ([doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).
4. **Promotions** β†’ When promotions are enabled, `gofer_client` notifies Gofer and manages switches; `SKIP_PROMOTIONS` short-circuits locally ([doc/gofer_client/AGENTS.md](doc/gofer_client/AGENTS.md)).
5. **Events** β†’ `ppl` publishers push pipeline/block updates to AMQP exchanges for UI consumers.

## Common Triage Paths
- **Pipeline stuck in `SCHEDULING` / `RUNNING`** β†’ Check `ppl` STM handlers and ensure dependent services (`definition_validator`, `block`, RabbitMQ) respond ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)).
- **Block stuck in `RUNNING` / `STOPPING`** β†’ Inspect `block` STM handlers and incoming Zebra events; use RabbitMQ tooling if events seem missing ([doc/block/AGENTS.md](doc/block/AGENTS.md)).
- **Matrix or YAML errors** β†’ Re-run `DefinitionValidator.validate_yaml_string/1` locally to reproduce schema/semantic issues ([doc/definition_validator/AGENTS.md](doc/definition_validator/AGENTS.md)).
- **Promotion failures** β†’ Confirm `SKIP_PROMOTIONS` is set appropriately and inspect `GoferClient` gRPC error tuples ([doc/gofer_client/AGENTS.md](doc/gofer_client/AGENTS.md)).
- **Mock data mismatches** β†’ Update referents (`repo_proxy_ref`, `task_api_referent`) when integration tests need new scenarios ([doc/repo_proxy_ref/AGENTS.md](doc/repo_proxy_ref/AGENTS.md)).

## Command Cheat Sheet
- Bootstrap every app: `mix setup` inside `ppl/`, `block/`, `definition_validator/`, `job_matrix/`, `gofer_client/`, and `looper/`.
- Run targeted tests where the failure originates (e.g. `cd ppl && MIX_ENV=test mix test`) before escalating ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md), [doc/block/AGENTS.md](doc/block/AGENTS.md)).
- Use `mix credo` routinely on Elixir apps; Looper/library apps are pure so linting catches most regressions ([doc/looper/AGENTS.md](doc/looper/AGENTS.md)).
- Mock services: start `repo_proxy_ref` and `task_api_referent` locally when plumbing end-to-end flows ([doc/repo_proxy_ref/AGENTS.md](doc/repo_proxy_ref/AGENTS.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).

## Observability + Tooling
- Watchman metrics prefixed with `Ppl.*`, `Block.*`, or `Looper.*` highlight slow handlers (see service-specific notes).
- LogTee tags (`ppl_id`, `block_id`, `task_id`, `request_token`) support cross-service tracing ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md), [doc/block/AGENTS.md](doc/block/AGENTS.md)).
- RabbitMQ exchanges: `pipeline_state_exchange`, `pipeline_block_state_exchange`, `after_pipeline_state_exchange`, `task_state_exchange`β€”confirm bindings when events disappear ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md), [doc/block/AGENTS.md](doc/block/AGENTS.md)).

## Guard Rails (Destructive Ops)
- Never run destructive git commands (`git reset --hard`, `git checkout --`, `git restore` on others' work, etc.) without explicit written approval in the task thread.
- Do not delete or revert files you did not author; coordinate with involved agents first. Moving/renaming is OK after agreement.
- Treat `.env` and environment files as read-onlyβ€”only the user may edit them.
- Before deleting a file to silence lint/type failures, stop and confirm with the user; adjacent work may be in progress.
- Keep commits scoped to files you changed; list paths explicitly during `git commit`.
- When rebasing, avoid editor prompts (`GIT_EDITOR=:` / `--no-edit`) and never amend commits unless the user requests it.
- After finishing a task, fold any new findings into the relevant `AGENTS.md` or `DOCUMENTATION.md` filesβ€”fix mistakes, add context, and preserve useful knowledge while keeping existing valuable guidance intact.

## Reference Index
- Pipelines edge + workflows: [doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md)
- Block service internals: [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md)
- YAML validation: [doc/definition_validator/DOCUMENTATION.md](doc/definition_validator/DOCUMENTATION.md)
- Matrix expansion: [doc/job_matrix/DOCUMENTATION.md](doc/job_matrix/DOCUMENTATION.md)
- Promotions client: [doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md)
- Worker macros: [doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)
- Repo & Task referents: [doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)
57 changes: 57 additions & 0 deletions plumber/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Plumber Stack Documentation Hub

This document stitches together the service-level docs under `doc/` so you have a single place to understand how the plumber stack fits together. Follow the links for deep dives.

## System Overview
- **Pipelines (`ppl/`)** – primary gRPC surface and orchestrator for pipeline/workflow state machines. It persists pipeline data, publishes AMQP events, and coordinates subordinate apps ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md)).
- **Block (`block/`)** – manages block lifecycle and Zebra task orchestration, reacting to RabbitMQ events to advance block/task state ([doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md)).
- **Definition Validator** – validates pipeline YAML (schema + semantic rules) before anything is persisted ([doc/definition_validator/DOCUMENTATION.md](doc/definition_validator/DOCUMENTATION.md)).
- **Job Matrix** – expands `matrix` / `parallelism` definitions into concrete job variants for downstream schedulers ([doc/job_matrix/DOCUMENTATION.md](doc/job_matrix/DOCUMENTATION.md)).
- **Gofer Client** – gRPC client for promotion workflows; wraps request formatting, transport, and response parsing ([doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md)).
- **Looper** – shared macros/utilities that generate STM and periodic workers used by `ppl` and `block` ([doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)).
- **Referents** – `repo_proxy_ref` (repo metadata) and `task_api_referent` (Zebra stand-in) supply deterministic fixtures for tests/local runs ([doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).

## Core Pipeline Lifecycle
1. **Ingress** – gRPC handlers in `ppl` accept schedule/terminate/list calls, convert protobufs into domain commands, and run YAML through `definition_validator` ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/definition_validator/DOCUMENTATION.md](doc/definition_validator/DOCUMENTATION.md)).
2. **Job Expansion** – `job_matrix` (and `parallelism` helpers) expand job definitions before pipeline/block rows are inserted ([doc/job_matrix/DOCUMENTATION.md](doc/job_matrix/DOCUMENTATION.md)).
3. **State Persistence** – `ppl` writes pipeline/build metadata via `Ppl.EctoRepo` and triggers Looper STM workers (`Ppl.Sup.STM`) ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)).
4. **Block Execution** – STM workers in `block` create and monitor Zebra tasks, consuming RabbitMQ events (`task_state_exchange`) to move blocks forward ([doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md)).
5. **Completion & Notifications** – `ppl` publishers emit pipeline/block/after-pipeline events over RabbitMQ for UI subscribers, and `gofer_client` notifies Gofer when promotions are involved ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md)).
6. **Testing & Referents** – During local/integration runs, referent services respond to repo/task RPCs so flows complete without external dependencies ([doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).

## Data Stores & Messaging
- **PostgreSQL** – `Ppl.EctoRepo` and `Block.EctoRepo` house pipeline/block/task state; migrations live alongside each app ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md)).
- **RabbitMQ** – primary event bus (`pipeline_state_exchange`, `pipeline_block_state_exchange`, `after_pipeline_state_exchange`, `task_state_exchange`) for cross-service coordination ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md)).
- **Watchman / LogTee** – metrics and structured logging used by STM workers and gRPC surfaces for observability ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md), [doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)).

## External Integrations
- **Zebra Task API** – accessed via internal clients; mimic behaviour with `task_api_referent` in non-prod environments ([doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).
- **Repo Proxy** – pipeline scheduling pulls repo metadata from repo-proxy (or the referent stub) before reading YAML ([doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md)).
- **Gofer** – promotions go through Gofer via `gofer_client`; guard with `SKIP_PROMOTIONS` for dev/test ([doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md)).

## Local Development & Operations
- Run `mix setup` inside `ppl/`, `block/`, `definition_validator/`, `job_matrix/`, `gofer_client/`, and `looper/` to install deps and prepare databases ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md), [doc/block/AGENTS.md](doc/block/AGENTS.md)).
- Launch the stack by starting `repo_proxy_ref` and `task_api_referent` (if external services unavailable), then `ppl` via `iex -S mix` ([doc/repo_proxy_ref/AGENTS.md](doc/repo_proxy_ref/AGENTS.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md), [doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)).
- Migrations often affect both repos; use `mix ecto.migrate -r Ppl.EctoRepo -r Block.EctoRepo` to keep schemas in sync ([doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)).
- Looper-based workers leverage `cooling_time_sec` and Wormhole retries; adjust configs or inspect metrics when loops stall ([doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)).

## Testing & QA
- Each app has its own `mix test` suite; run the failing service’s tests first (e.g. `cd block && MIX_ENV=test mix test`) ([doc/block/AGENTS.md](doc/block/AGENTS.md)).
- `definition_validator` includes fixture-based tests (`mix test.watch` is handy while editing schemas) ([doc/definition_validator/DOCUMENTATION.md](doc/definition_validator/DOCUMENTATION.md)).
- Library apps (`job_matrix`, `gofer_client`, `looper`) are pure and quick to testβ€”use them to pin down regressions before integrating ([doc/job_matrix/DOCUMENTATION.md](doc/job_matrix/DOCUMENTATION.md), [doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md), [doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md)).
- Referents have their own suites to lock in canned scenarios; update tests when extending mock behaviours ([doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)).

## Observability Checklist
- Metrics prefixes: `Ppl.*`, `Block.*`, `Looper.*` (Watchman).
- Log correlation keys: `ppl_id`, `wf_id`, `block_id`, `task_id`, `request_token` (LogTee).
- RabbitMQ DLQs hint at decode/state issuesβ€”investigate when STM workers stall.
- gRPC health endpoints exposed via each service’s `HealthCheck` module support Kubernetes probes ([doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md), [doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md)).

## Reference Links
- Pipelines edge & workflows: [doc/ppl/DOCUMENTATION.md](doc/ppl/DOCUMENTATION.md), [doc/ppl/AGENTS.md](doc/ppl/AGENTS.md)
- Block lifecycle: [doc/block/DOCUMENTATION.md](doc/block/DOCUMENTATION.md), [doc/block/AGENTS.md](doc/block/AGENTS.md)
- YAML validation: [doc/definition_validator/DOCUMENTATION.md](doc/definition_validator/DOCUMENTATION.md), [doc/definition_validator/AGENTS.md](doc/definition_validator/AGENTS.md)
- Matrix expansion: [doc/job_matrix/DOCUMENTATION.md](doc/job_matrix/DOCUMENTATION.md), [doc/job_matrix/AGENTS.md](doc/job_matrix/AGENTS.md)
- Promotions: [doc/gofer_client/DOCUMENTATION.md](doc/gofer_client/DOCUMENTATION.md), [doc/gofer_client/AGENTS.md](doc/gofer_client/AGENTS.md)
- Worker macros: [doc/looper/DOCUMENTATION.md](doc/looper/DOCUMENTATION.md), [doc/looper/AGENTS.md](doc/looper/AGENTS.md)
- Referents: [doc/repo_proxy_ref/DOCUMENTATION.md](doc/repo_proxy_ref/DOCUMENTATION.md), [doc/task_api_referent/DOCUMENTATION.md](doc/task_api_referent/DOCUMENTATION.md)
25 changes: 25 additions & 0 deletions plumber/doc/block/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Block Agent Notes

## Quick Map
- Supervision root: `Block.Application` starts `Block.EctoRepo`, `Block.Sup.STM`, and `Block.Tasks.TaskEventsConsumer`.
- State machines live in `block/lib/block/{blocks,tasks}/stm_handler/`; each module wraps a Looper worker that polls and advances rows.
- Persistence: PostgreSQL schema under `block/priv/ecto_repo/migrations`; repo module is `Block.EctoRepo`.
- RabbitMQ: consumer binds to `task_state_exchange` (routing key `finished`).

## Daily Commands
- Setup (deps + DB): `cd block && mix setup`.
- Run migrations only: `cd block && mix ecto.migrate`.
- Tests: `cd block && MIX_ENV=test mix test` (DB wiped automatically).
- Console: `cd block && iex -S mix` (ensure `RABBITMQ_URL` and database env vars set).

## Debug Pointers
- Task stuck RUNNING? Inspect `Block.Tasks.STMHandler.RunningState` logic and confirm RabbitMQ event arrived. Use `rabbitmqadmin get queue=task_state_exchange.finished` for inspection.
- Blocks not spawning tasks? Check the `Block.CodeRepo` command reader – invalid YAML results propagate from `definition_validator`.
- STOPPING never finishes? Verify callbacks `:compile_task_done_notification_callback` / `:after_ppl_task_done_notification_callback` in config; missing modules will raise `apply/3` errors.
- Database drift? Compare schema with latest migrations and rerun `mix ecto.migrate` (test env uses sandbox DB `block_test`).

## Env Vars
- `RABBITMQ_URL` – required for AMQP consumer/publishers.
- `COMPILE_TASK_DONE_NOTIFICATION_CALLBACK`, `AFTER_PPL_TASK_DONE_NOTIFICATION_CALLBACK` – MFA tuples as `{Module, :function}`; defaults log warnings when unset.

Keep this close when triaging block execution or termination flows.
38 changes: 38 additions & 0 deletions plumber/doc/block/DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Block Service

## Overview
Block manages the execution lifecycle of pipeline blocks and their Zebra tasks. It persists block state, reacts to events emitted by the build system, coordinates stop/cancel transitions and publishes follow-up events through AMQP. Although it can run standalone, it is usually started under the main `ppl` application.

## Responsibilities
- Accept block execution requests produced by `ppl` and materialise them as `block_requests` and `block_builds` rows.
- Drive block state machines (`INITIALIZING`, `WAITING`, `RUNNING`, `STOPPING`, `DONE`) using Looper-based orchestrators under `Block.Sup.STM`.
- Monitor Zebra task lifecycle via `Block.Tasks.TaskEventsConsumer` (RabbitMQ exchange `task_state_exchange`, routing key `finished`) and advance corresponding block/task records.
- Coordinate compilation/after-pipeline callbacks through configurable hooks (`:compile_task_done_notification_callback` and `:after_ppl_task_done_notification_callback`).

## Architecture
- **Supervision tree**: `Block.Application` boots `Block.EctoRepo`, the STM supervisor (`Block.Sup.STM`) and the RabbitMQ consumer.
- **State machines**: Implemented in `block/lib/block/blocks/stm_handler/*` and `block/lib/block/tasks/stm_handler/*`; each handler is a Looper worker that periodically picks pending records.
- **Persistence**: `block/priv/ecto_repo/migrations` define tables for requests, builds, sub-pipelines, and task metadata. `Block.Repo` wraps PostgreSQL via `ecto_sql`.
- **External dependencies**: communicates with Zebra/Gofer through task IDs, validates commands with `definition_validator`, and emits notifications via AMQP and Watchman metrics.

## Data Flow Highlights
1. `ppl` schedules a block β†’ `block_requests` + `block_builds` rows are created.
2. STM loopers transition blocks from `waiting` to `running`, provisioning tasks via Zebra.
3. Zebra marks task finished β†’ RabbitMQ message consumed β†’ STM handlers move block/task to `done`, determine result/reason and trigger callbacks.
4. Termination requests push blocks into `stopping` which instructs Zebra to cancel outstanding tasks; completion reason is persisted before publish.

## Configuration
- `RABBITMQ_URL` – connection string used by `Block.Tasks.TaskEventsConsumer` and Looper AMQP publishers.
- `COMPILE_TASK_DONE_NOTIFICATION_CALLBACK`, `AFTER_PPL_TASK_DONE_NOTIFICATION_CALLBACK` – optional MFA tuples configured in `config/*.exs` for cross-service signalling.
- Database credentials configured in `config/{dev,test,prod}.exs` under `Block.EctoRepo`.

## Operations
- Install deps & run migrations: `cd block && mix setup`.
- Start locally: `cd block && iex -S mix` (ensure Postgres & RabbitMQ are reachable).
- Run tests: `cd block && MIX_ENV=test mix test` (DB is managed by `mix test` fixtures).
- Lint: `cd block && mix credo`.

## Observability
- Metrics: most STM operations wrap `Util.Metrics.benchmark` (look for Watchman entries prefixed with `Block.*`).
- Logging: LogTee provides structured logs; search by `block_id`/`task_id` for correlation.
- RabbitMQ dead-letter queues should be monitored when state transitions stall (stuck messages indicate decoding issues).
Loading