This document specifies three interrelated capabilities for TinyCI, each derived from ideas in the "BEAM-native CI" conversation, filtered through a quality engineering lens. Each section covers motivation, detailed design, integration with the existing codebase, tradeoffs, and DX considerations.
The three capabilities are:
- Event-Sourced Execution Engine — an append-only event log that becomes the authoritative record of every pipeline run, enabling run history, live streaming to UI consumers, and step-level replay
- Adaptive Flaky Test Isolation — per-test retry and quarantine, surfacing individual test failures rather than retrying entire step commands
- Fault-Tolerant Stage Execution — a
continue_on_failure:mode for serial stages that collects all failures instead of halting on the first one, and a hardened parallel mode that prevents one crashing Task from corrupting sibling execution
These three are listed in implementation priority order. Each can be shipped independently. They share a common dependency: the event log (Capability 1) is the substrate that makes Capabilities 2 and 3 observable without coupling them to any specific UI or persistence backend.
Today, TinyCI's executor is a pure function: it takes stages + context and returns {:ok, [StageResult]} or {:error, ...}. Results are printed to stdout via Reporter and then discarded. Nothing persists across runs. There is no way for an external consumer (a web UI, a TUI, a webhook) to observe execution as it happens.
To support the planned web frontend and TUI, and to make run history possible without a heavyweight persistence layer, every meaningful state transition during execution should emit a typed, serializable event to an append-only log. Pipeline state at any moment is a pure projection of that log.
This is the BEAM GPT conversation's "event-sourced execution model" applied concretely and incrementally — without rewriting the executor.
Executor (unchanged contract)
│
│ emits events as a side effect
▼
TinyCI.EventLog ←── ETS table (fast, per-run, in-memory)
│
├── Projector → current pipeline state (for live queries)
├── Disk sink → ~/.tiny_ci/runs/<run_id>.jsonl (persistence)
└── PubSub → broadcast to web/TUI subscribers
The executor's return value and calling convention are unchanged. Events are emitted via calls to TinyCI.EventLog.emit/1 at natural boundaries in the executor code. The log is ETS-backed so emission is non-blocking. Disk writes and PubSub broadcasts happen in a separate GenServer so they cannot slow down execution.
All events share a common header and are tagged structs:
# Common header embedded in every event
%{run_id: String.t(), timestamp: DateTime.t()}
# ── Pipeline ──────────────────────────────────────────────────────────────
%TinyCI.Events.PipelineStarted{
run_id: String.t(), # "20260423_143201_main_a1b2c3d"
pipeline_name: atom(),
branch: String.t(),
commit: String.t(),
timestamp: DateTime.t()
}
%TinyCI.Events.PipelineCompleted{
run_id: String.t(),
status: :passed | :failed,
duration_ms: non_neg_integer(),
timestamp: DateTime.t()
}
# ── Stage ─────────────────────────────────────────────────────────────────
%TinyCI.Events.StageStarted{
run_id: String.t(),
stage_name: atom(),
mode: :serial | :parallel,
needs: [atom()],
timestamp: DateTime.t()
}
%TinyCI.Events.StageSkipped{
run_id: String.t(),
stage_name: atom(),
reason: :condition_not_met | :dependency_failed,
timestamp: DateTime.t()
}
%TinyCI.Events.StageCompleted{
run_id: String.t(),
stage_name: atom(),
status: :passed | :failed | :skipped,
duration_ms: non_neg_integer(),
timestamp: DateTime.t()
}
# ── Step ──────────────────────────────────────────────────────────────────
%TinyCI.Events.StepStarted{
run_id: String.t(),
stage_name: atom(),
step_name: atom(),
type: :cmd | :module,
cmd: String.t() | nil,
working_dir: String.t() | nil,
env: map(),
timestamp: DateTime.t()
}
%TinyCI.Events.StepSkipped{
run_id: String.t(),
stage_name: atom(),
step_name: atom(),
timestamp: DateTime.t()
}
%TinyCI.Events.StepOutputLine{
run_id: String.t(),
stage_name: atom(),
step_name: atom(),
line: String.t(),
timestamp: DateTime.t()
}
%TinyCI.Events.StepRetrying{
run_id: String.t(),
stage_name: atom(),
step_name: atom(),
attempt: pos_integer(),
max_attempts: pos_integer(),
timestamp: DateTime.t()
}
%TinyCI.Events.StepCompleted{
run_id: String.t(),
stage_name: atom(),
step_name: atom(),
status: :passed | :failed | :skipped,
duration_ms: non_neg_integer(),
exit_code: integer() | nil, # nil for module steps
attempts: pos_integer(),
store_updates: map(),
timestamp: DateTime.t()
}
# ── Matrix ────────────────────────────────────────────────────────────────
%TinyCI.Events.MatrixRunStarted{
run_id: String.t(),
stage_name: atom(),
combination: keyword(String.t()),
combination_label: String.t(),
timestamp: DateTime.t()
}
%TinyCI.Events.MatrixRunCompleted{
run_id: String.t(),
stage_name: atom(),
combination_label: String.t(),
status: :passed | :failed,
duration_ms: non_neg_integer(),
timestamp: DateTime.t()
}
# ── Hook ──────────────────────────────────────────────────────────────────
%TinyCI.Events.HookStarted{
run_id: String.t(),
hook_name: atom(),
hook_type: :on_success | :on_failure,
timestamp: DateTime.t()
}
%TinyCI.Events.HookCompleted{
run_id: String.t(),
hook_name: atom(),
status: :passed | :failed,
duration_ms: non_neg_integer(),
timestamp: DateTime.t()
}Design decisions:
run_idis generated atrun_pipeline/3entry:"#{date}_#{time}_#{branch_slug}_#{commit_short}". It is stored in the pipeline context so all executor functions can access it without threading a separate parameter.StepOutputLinecaptures every line emitted fromOutput.collect_port/4. This is the highest-volume event. For large test suites this could be thousands of events per run. The ETS table uses a bag (not a set) to preserve order. See §1.7 for volume considerations.- Events are not sent to
StepOutputLinein buffered mode (non-TTY, matrix runs). This is a deliberate tradeoff: buffered mode already captures full output inStepResult.output, so there is no need to also emit line-by-line events.
defmodule TinyCI.EventLog do
# Public API
def start_run(run_id) :: :ok
def emit(event) :: :ok # fire-and-forget, never blocks
def get_events(run_id) :: [event] # all events for a run
def stream_events(run_id) :: Enumerable.t # lazy stream from ETS
def subscribe(run_id, pid) :: :ok # send each new event to pid
def unsubscribe(run_id, pid) :: :ok
endInternals:
- One ETS table per run:
:"tiny_ci_run_#{run_id}". The table is a:bag(allows duplicate keys) owned by a GenServer (TinyCI.RunRegistry) that is started when the OTP application starts. emit/1is a simpleETS.insert/2. O(1), never blocks the caller.- The
RunRegistryGenServer also handles: disk sink (writes JSON lines to~/.tiny_ci/runs/<run_id>.jsonl), subscriber notifications (sends events to allsubscribe/2callers). - ETS tables for completed runs are garbage-collected after 1 hour by the RunRegistry via
Process.send_after/3. Long-lived installations need this cleanup or memory grows unboundedly.
Addition to TinyCI.Application:
children = [
{Task.Supervisor, name: TinyCI.TaskSupervisor},
TinyCI.RunRegistry # new
]The executor already has clear natural boundaries. Event emission slots in without restructuring:
| Boundary | Event emitted |
|---|---|
run_pipeline/3 entry |
PipelineStarted |
run_pipeline/3 return |
PipelineCompleted |
execute/3 entry — stage skipped |
StageSkipped |
execute/3 entry — stage runs |
StageStarted |
execute_regular_stage/3 return |
StageCompleted |
execute_matrix_stage/3 per-combination start |
MatrixRunStarted |
execute_matrix_stage/3 per-combination end |
MatrixRunCompleted |
| Step execution — before command | StepStarted |
| Step execution — step skipped | StepSkipped |
| Step retry loop — before each retry | StepRetrying |
| Step execution — after command | StepCompleted |
Output.collect_port/4 — each line |
StepOutputLine |
Hooks — before hook |
HookStarted |
Hooks — after hook |
HookCompleted |
The run_id is added to the pipeline context at run_pipeline/3:
ctx = Map.put_new(ctx, :run_id, generate_run_id(ctx))Every executor function already receives ctx, so ctx.run_id is available at all emission points with no signature changes.
For Output.collect_port/4, the run_id, stage_name, and step_name need to be passed in. Output.run_cmd/2 already accepts opts — extend with opts[:event_context] containing these three fields. If event_context is nil, no events are emitted (backward-compatible).
The projector rebuilds the current logical state of a run from its event log. It is a pure function used for queries (web UI, mix tiny_ci.runs show):
defmodule TinyCI.RunProjector do
@spec project([event]) :: run_state()
@type run_state :: %{
run_id: String.t(),
status: :running | :passed | :failed,
started_at: DateTime.t(),
finished_at: DateTime.t() | nil,
duration_ms: non_neg_integer() | nil,
stages: %{atom() => stage_state()},
store: map()
}
endThe projector is called:
- By the web frontend when loading a completed run detail page (projects from disk log)
- By
TinyCI.RunRegistryto maintain live in-memory state for the active run (projects from ETS as events arrive)
"Replay this step" means: re-run a step with the exact same inputs as a previous run.
The StepStarted event captures: cmd, working_dir, env. This is the full input required to re-execute a shell step. For module steps, capturing the config keyword list is also required — add config: keyword() to StepStarted.
A replay operation:
- Load the event log for run
<run_id>from disk - Find the
StepStartedevent for the target step - Find the last
PipelineStartedevent (for run context) - Reconstruct a minimal
%TinyCI.Stage{}containing only the target step - Call
Executor.execute/3with the reconstructed stage and context - Emit events under a new
run_idwithparent_run_idset to the original
This gives "replay this step with the same inputs" without message-level determinism. It is practical replay, not mathematical replay.
What replay cannot guarantee: if the step's behavior depends on external state (a flaky network, a filesystem that changed), the outcome may differ. That is acceptable and expected — the point is to re-run with the same declared inputs, not the same universe.
mix tiny_ci.runs list # last 20 runs, status + duration
mix tiny_ci.runs show <run_id> # full stage/step tree (same format as Reporter)
mix tiny_ci.runs logs <run_id> <step> # full output for a single step
mix tiny_ci.runs replay <run_id> --stage :test --step :unit # replay a step
Pros:
- Single source of truth. The disk log is the authoritative record. The reporter, web UI, and TUI all read from the same source — there is no divergence between what was printed to the terminal and what the UI shows.
- Run history is free. No separate database query layer needed. The log files are the history. They can be
grep-ed, parsed by scripts, archived, or deleted individually. - Decouples consumers. The web frontend, TUI, and future integrations subscribe to the event stream. The executor knows nothing about them. New consumers are added without touching the executor.
- Incremental. The executor's contract doesn't change.
emit/1is a side effect that can be added to one function at a time and tested in isolation. - Searchable output.
StepOutputLineevents make step output searchable across runs without loading full output blobs. "Find all runs where step :unit emitted 'ConnectionError'" becomes a log scan. - Debuggability. "What happened in this run?" is answered by reading the event log, not by re-running the pipeline. The timeline of events is the ground truth.
Cons:
- Volume.
StepOutputLinecan be high-frequency for large test suites. A test run emitting 10,000 lines creates 10,000 ETS entries and 10,000 JSON lines on disk. This is manageable (ETS is fast; JSON lines at ~200 bytes each is ~2MB per run) but needs monitoring for extreme cases. Mitigation: amax_output_lines: Nconfig to cap captured lines, or only emit output events when the step hascapture_output: true. - Schema evolution. Event structs are serialized to disk. If a field is added or removed, old log files become incompatible. Mitigation: version the event schema. Each JSON line includes
"schema_version": 1. The projector handles version-specific deserialization. - ETS lifetime. The in-memory ETS table for a completed run holds all events until garbage-collected by the RunRegistry. For long pipelines with verbose output, this is non-trivial memory. Mitigation: stream events from disk when projecting a completed run rather than loading into ETS.
StepOutputLinein buffered mode. We chose to skip per-line events in buffered mode (matrix runs, non-TTY). This means the web UI cannot stream output line-by-line for matrix runs — it only sees the full output after the run completes. This is a known limitation and a reasonable tradeoff for now.
Pipeline authors see nothing. Events are infrastructure, not DSL. Adding event_log: false to the mix task config disables disk persistence for users who don't want it.
Operators get a new mix tiny_ci.runs sub-task with list, show, logs, and replay commands. These are additive — the existing mix tiny_ci.run behavior is unchanged.
Web/TUI consumers subscribe to EventLog.subscribe(run_id, self()) and receive events as they arrive. The web frontend's LiveView process subscribes on page load and renders updates without polling.
Testing the event log: ExecutorTest cases can assert that specific events were emitted by subscribing to the EventLog before running a pipeline:
test "emits StepCompleted event" do
EventLog.subscribe(run_id, self())
Executor.run_pipeline([stage], context)
assert_receive %Events.StepCompleted{step_name: :unit, status: :passed}
endTinyCI currently retries at the step level. If mix test fails, the entire mix test command reruns. For a 500-test suite with 2 flaky tests, you re-run 500 tests to recover 2. This wastes time and obscures signal — developers don't know whether a failure is a real regression or a known flake until they rerun manually.
The BEAM's lightweight processes make per-test retry natural: each re-run of an isolated failing test is a Task with no shared state with the other tests. The design below surfaces this capability through the pipeline DSL without requiring pipeline authors to understand processes.
This feature is intentionally scoped to ExUnit for the initial implementation. Other test frameworks (pytest, Jest, Vitest, etc.) are architecturally supported via a pluggable parser interface but are not shipped until there is demand. This keeps the implementation focused and avoids building parsers for frameworks that may never be used by TinyCI's audience.
step :unit,
cmd: "mix test",
test_runner: :ex_unit, # enables adaptive behavior
flaky_retries: 2, # retry failing tests up to 2 times individually
quarantine_threshold: 3, # quarantine after 3 failures across recent runs
quarantine_path: ".tiny_ci/quarantine.json" # optional, default shownWhen test_runner: is absent, the step behaves exactly as today. No behavior change for existing pipelines.
1. Run cmd normally
│
▼
2. If exit 0 → StepCompleted(:passed) [same as today]
│
▼ (exit non-zero)
3. Parse stdout for failing test identifiers
│
▼
4. Check quarantine list: is this test already quarantined?
├── YES → mark this specific test as quarantined_failure, continue
└── NO → proceed to per-test retry
│
▼
5. Re-run only failing tests (one Task per test)
│
▼
6. If all retries pass → StepCompleted(:passed) + emit TestFlakyRecovered events
If any test still fails after flaky_retries:
├── Not yet at quarantine_threshold → StepCompleted(:failed)
└── At quarantine_threshold → update quarantine list, StepCompleted(:passed with quarantine warning)
ExUnit prints failing tests in a deterministic format:
1) test user login with valid credentials (MyApp.AuthTest)
test/my_app/auth_test.exs:42
...
TinyCI.TestParsers.ExUnit extracts {file, line} pairs from stdout using a regex over the numbered failure list. This is brittle only if ExUnit changes its output format — which it has not done meaningfully since ExUnit 1.0.
The re-run command is constructed as:
mix test test/my_app/auth_test.exs:42 test/other_test.exs:88
This is a documented, stable ExUnit feature.
defmodule TinyCI.TestParser do
@type test_id :: %{file: String.t(), line: pos_integer(), name: String.t()}
@callback parse_failures(output :: String.t()) :: [test_id()]
@callback build_rerun_cmd(base_cmd :: String.t(), test_ids :: [test_id()]) :: String.t()
endTinyCI.TestParsers.ExUnit implements this behaviour. Future parsers for pytest, Jest, etc. implement the same interface and are registered in a lookup map keyed by the test_runner: atom.
Flakiness history is persisted to a JSON file at quarantine_path (default: .tiny_ci/quarantine.json, committed to the repo so the team shares the quarantine list).
Structure:
{
"test/my_app/auth_test.exs:42": {
"name": "test user login with valid credentials",
"failures": 5,
"passes": 12,
"last_failed": "2026-04-23T14:32:01Z",
"quarantined": false
}
}quarantine_threshold: 3 means: if failures >= 3 in the tracking file, mark the test as quarantined on the next failure. Quarantined tests still run (their quarantine status is not a skip — teams should fix flaky tests, not ignore them). They are reported distinctly in the pipeline summary and in the web UI.
Three new events feed into the EventLog:
%Events.TestFlakyRetried{run_id, stage_name, step_name, test_id, attempt}
%Events.TestFlakyRecovered{run_id, stage_name, step_name, test_id, attempt}
%Events.TestQuarantined{run_id, stage_name, step_name, test_id, failure_count}The web UI can show a "Flaky Tests" tab summarizing tests that needed retries or are quarantined, across runs.
Pros:
- Dramatic reduction in re-run cost. Re-running 2 tests out of 500 is 250x cheaper than re-running all 500.
- Surfaces real signal. Teams know which tests are flaky (the quarantine list) vs which are genuine regressions (new failures on tests not in the quarantine list).
- Zero configuration for the common case.
test_runner: :ex_unitauto-detectsmix testand the standard ExUnit output format. No additional setup. - The quarantine list is tracked in version control. The team owns the quarantine list. It's not a hidden CI setting — it's auditable.
- BEAM concurrency is a natural fit. Each per-test re-run is a
Taskwith no shared mutable state. The BEAM's scheduler handles concurrency for free.
Cons:
- Brittle parsing. The ExUnit output parser depends on ExUnit's output format. Changes to ExUnit's failure summary would break it. Mitigation: pin to tested ExUnit versions; add an integration test against real ExUnit output.
- ExUnit-only at launch. Teams using pytest or Jest get no benefit. Mitigation: the parser interface is open; contributors can add parsers.
- Quarantine abuse. A team can quarantine a genuinely broken test by reaching the threshold. Mitigation: emit warnings in the reporter when a test has been quarantined for more than N days.
- Does not handle test-order-dependent failures. Some flaky tests only fail when run after a specific other test. Per-test re-run in isolation won't detect this. Mitigation: document the limitation; it's a separate problem.
mix test path:linedoesn't support all test selectors. Some ExUnit tests are generated dynamically or use:onlytags. Mitigation: fall back to the full step retry if the re-run command is unparseable.
Happy path — a developer adds test_runner: :ex_unit to their test step. On the next flaky run, the reporter shows:
✓ test — passed (12.3s)
○ unit — flaky recovery (2 tests retried, passed on attempt 2)
⚠ test/auth_test.exs:42 recovered (attempt 2)
⚠ test/user_test.exs:88 recovered (attempt 2)
Quarantine notification — when a test is quarantined:
✓ test — passed with warnings (12.3s)
⚠ unit — 1 test quarantined (see .tiny_ci/quarantine.json)
○ test/auth_test.exs:42 [QUARANTINED] failed 3 times, not blocking pipeline
Dry run — shows the quarantine list:
▶ :test (parallel)
• :unit — cmd: "mix test" [test_runner: ex_unit, flaky_retries: 2]
Quarantined tests (1): test/auth_test.exs:42
TinyCI's serial stages halt on the first failing step (Enum.reduce_while with :halt). For a lint stage running format check + credo + dialyzer in series, a format failure stops credo from running. The developer fixes the format issue, reruns, and discovers a credo failure. They fix that. They rerun again and find a dialyzer failure.
This is the classic "onion peeling" CI failure experience. Teams want all failures at once. The fix is a continue_on_failure: true option for serial stages that runs all steps and reports all failures together.
For parallel stages, a secondary issue exists: an uncaught exception in a Task process (not a non-zero exit code, but an actual Elixir exception thrown by a module step) currently propagates to Task.await_many/2 and raises in the executor, potentially corrupting partial results. The executor needs to handle task crashes gracefully.
stage :lint, mode: :serial, continue_on_failure: true do
step :format, cmd: "mix format --check-formatted"
step :credo, cmd: "mix credo"
step :dialyzer, cmd: "mix dialyzer"
endWhen continue_on_failure: true, the serial execution loop uses Enum.reduce instead of Enum.reduce_while — it runs all steps and accumulates results. The stage is marked :failed if any step failed, but all steps ran.
Store semantics with continue_on_failure:: In normal serial mode, a failed step's store updates propagate to later steps in the same stage (because later steps might depend on earlier store values). With continue_on_failure:, later steps still receive the store from prior steps — the behavior is the same as if each prior step had succeeded, using whatever store data was produced up to the failure.
Field addition to TinyCI.Stage:
continue_on_failure: boolean() # default: falseCurrently, execute_by_mode/3 for parallel stages spawns tasks and calls Task.await_many/2. If a module step raises an uncaught exception, the Task exits abnormally and Task.await_many/2 raises in the executor process.
The fix: replace Task.await_many/2 with a manual Task.yield_many/2 loop that handles both normal returns and exits:
tasks = Enum.map(steps, &spawn_step_task/3)
results = Task.yield_many(tasks, :infinity)
step_results = Enum.map(results, fn
{_task, {:ok, result}} -> result
{task, {:exit, reason}} -> crashed_step_result(task, reason)
{task, nil} -> timed_out_step_result(task)
end)crashed_step_result/2 produces a %StepResult{status: :failed, output: "Step crashed: #{inspect(reason)}"}. The stage continues collecting all step results rather than raising. This brings parallel stages to parity with shell steps — they fail gracefully rather than crashing the executor.
The GPT conversation describes mapping OTP supervision strategies (:one_for_one, :rest_for_one) onto CI stages. This is architecturally sound for the distributed, multi-node case — where a worker node dying should restart affected steps on another node. This is explicitly deferred until TinyCI has a distributed agent model, because it requires:
- Persistent step state (to know where to resume after restart)
- Multiple execution nodes
- A scheduler that understands topology
For the single-node (local) case, the retry mechanism combined with continue_on_failure: covers the practical need. Structural OTP supervision is over-engineering for that scope.
continue_on_failure: pros:
- Eliminates onion-peeling CI failures. All failures visible on first run.
- Extremely simple to implement. Change
Enum.reduce_whiletoEnum.reducein the serial execution path when the flag is set. - Familiar semantics. GitHub Actions has
continue-on-error: trueper step; this is the stage-level equivalent. - Additive. Default is
false, existing pipelines unchanged.
continue_on_failure: cons:
- Semantically tricky with dependent steps. If step A writes something to the store and step B reads it, and A fails, B may behave unexpectedly. Mitigation: document that
continue_on_failure:is intended for independent steps (linters, checkers); not for steps with data dependencies. - Longer failure feedback when not wanted. A deployment stage with
continue_on_failure:would attempt all deploy steps even after one fails, potentially causing partial deploy state. Mitigation: the default isfalse; the flag is opt-in.
Hardened parallel execution pros:
- Prevents executor corruption. A crashing module step cannot take down the executor process.
- Consistent failure semantics. All parallel steps run to completion (or crash), same as serial with
continue_on_failure:. - Zero API change. Transparent to pipeline authors.
Hardened parallel execution cons:
- Slightly more complex parallel step collection code (
yield_manyvsawait_many). - A crashed step's output is a generic error message, not the actual exception traceback. Mitigation: capture the exception and its stacktrace in the
StepResult.outputfield.
Serial stage with continue_on_failure::
✗ lint — failed (3.4s)
✗ format (0.2s) ← all three ran
✗ credo (1.1s) ← visible in one run
✓ dialyzer (2.1s)
vs current behavior:
✗ lint — failed (0.2s)
✗ format (0.2s) ← halted here, credo never ran
Dry run shows the continue_on_failure: flag:
▶ :lint (serial, continue_on_failure)
• :format — cmd: "mix format --check-formatted"
• :credo — cmd: "mix credo"
• :dialyzer — cmd: "mix dialyzer"
Pipeline runs
│
├── EventLog (Capability 1)
│ ├── ETS: live state for active run
│ ├── Disk: ~/.tiny_ci/runs/<run_id>.jsonl
│ ├── PubSub: web frontend, TUI
│ └── Events include: TestFlakyRetried, TestQuarantined (Cap 2)
│ StepCompleted with all-steps data (Cap 3)
│
├── Adaptive Test Isolation (Capability 2)
│ ├── Uses EventLog to record per-test events
│ └── Uses RunRegistry to look up flakiness history across runs
│
└── Fault-Tolerant Execution (Capability 3)
└── Uses EventLog to record all step completions in continue_on_failure stages
Capability 1 is the infrastructure both Capabilities 2 and 3 rely on for observability. Capability 2 and 3 are independently useful and can be shipped before the web frontend is built — their events are emitted regardless of whether any subscriber is listening.
The GPT conversation contains several ideas this document intentionally excludes. These are recorded here so the decision is explicit:
"Introduce a logical scheduler layer above BEAM that intercepts all messages and controls delivery order"
This requires instrumenting BEAM's process scheduler at a level that tools like Concuerror and Mocking have spent years on. It is research-grade work, not a product feature. The "replay" capability in §1.7 provides the useful form of replay (re-run with same declared inputs) without the VM-level complexity.
"Maintain a long-lived environment across runs; apply migrations incrementally; track drift"
This is an infrastructure management product (closer to Terraform + a deployment system). It is not CI. Building it would require TinyCI to own the lifecycle of external systems (databases, containers, services), which is a fundamentally different scope and responsibility.
"
inject_failure :network_partition,explore concurrency: 100 do run_distributed_test() end"
This is a specialized testing tool for teams building distributed systems — a niche within a niche. Jepsen exists for this. If TinyCI's user base grows to include distributed-systems teams, these primitives could be a TinyCI.Steps.Chaos library. They should not be in the core runtime.
Deferred until a distributed multi-node execution model exists, as described in §3.2.3.
Phase 1: EventLog infrastructure
├── TinyCI.Events (event structs)
├── TinyCI.RunRegistry (GenServer, ETS, disk sink)
├── TinyCI.EventLog (public API)
├── Emission points in Executor (run_pipeline, execute, step execution)
├── Emission in Output.collect_port (StepOutputLine)
├── mix tiny_ci.runs list/show/logs
└── Tests: assert events emitted, disk log written, subscriber receives events
Phase 2: Fault-Tolerant Execution
├── Add continue_on_failure: to Stage struct and DSL validator/interpreter
├── Modify serial execution path in Executor
├── Harden parallel Task collection (yield_many)
└── Tests: all steps run on failure, parallel crash handled gracefully
Phase 3: Adaptive Flaky Test Isolation
├── TinyCI.TestParser behaviour
├── TinyCI.TestParsers.ExUnit
├── Add test_runner:, flaky_retries:, quarantine_threshold: to Step struct
├── Modify step execution to invoke adaptive flow when test_runner: is set
├── Quarantine file read/write
├── New events: TestFlakyRetried, TestFlakyRecovered, TestQuarantined
└── Tests: parse ExUnit output, re-run failing tests, quarantine threshold behavior
Phase 4: Web/TUI (separate project)
└── Subscribe to EventLog PubSub; project run state; render
Each phase is independently releasable and independently useful. Phase 1 is the only blocking dependency.
| Phase | New files | Modified files |
|---|---|---|
| 1: EventLog | lib/tiny_ci/events.ex, lib/tiny_ci/run_registry.ex, lib/tiny_ci/event_log.ex, lib/tiny_ci/run_projector.ex, lib/mix/tasks/tiny_ci.runs.ex |
lib/tiny_ci/application.ex, lib/tiny_ci/executor.ex, lib/tiny_ci/output.ex |
| 2: Fault-Tolerant | — | lib/tiny_ci/tiny_ci.ex (Stage struct), lib/tiny_ci/dsl/validator.ex, lib/tiny_ci/dsl/interpreter.ex, lib/tiny_ci/executor.ex, lib/tiny_ci/dry_run.ex |
| 3: Adaptive Tests | lib/tiny_ci/test_parser.ex, lib/tiny_ci/test_parsers/ex_unit.ex, lib/tiny_ci/flakiness_store.ex |
lib/tiny_ci/tiny_ci.ex (Step struct), lib/tiny_ci/dsl/validator.ex, lib/tiny_ci/dsl/interpreter.ex, lib/tiny_ci/executor.ex, lib/tiny_ci/reporter.ex, lib/tiny_ci/dry_run.ex |
| Capability | How to verify |
|---|---|
| EventLog — events emitted | mix test with event subscriber assertions |
| EventLog — disk persistence | Assert ~/.tiny_ci/runs/<run_id>.jsonl exists and parses after a run |
| EventLog — subscriber | Assert subscriber PID receives events during execution |
continue_on_failure: |
Lint stage with 3 failing steps all appear in results |
| Parallel crash hardening | Module step that raises still produces a StepResult, not an executor crash |
| Adaptive test isolation — parsing | Unit test ExUnit parser against fixture output strings |
| Adaptive test isolation — re-run | Integration test: inject a flaky ExUnit test, assert retry occurs |
| Adaptive test isolation — quarantine | After N failures, assert quarantine.json updated and step passes |
| All capabilities | mix credo, mix format --check-formatted, mix compile --warnings-as-errors |