Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
b596449
test(docs): scaffold bats doc-validation harness
Tombar Jun 2, 2026
2f922c7
test(docs): markdown fenced-block extractor with heading anchors
Tombar Jun 2, 2026
8df5a4e
test(docs): harden extractor (literal fence match, count reset, statu…
Tombar Jun 2, 2026
f542c00
test(docs): sandbox HOME, tool guards, mode-file helpers
Tombar Jun 2, 2026
3132334
test(docs): harden sandbox teardown guard and mode-file CR/LF trim
Tombar Jun 2, 2026
654530c
test(docs): cross-cutting chain/canonical/alias claims
Tombar Jun 2, 2026
f4e1bfb
test(docs): reset state per ro-alias so each alias is actively verified
Tombar Jun 2, 2026
e8b6b51
test(docs): claude-statusline.sh contract + graceful degrade
Tombar Jun 2, 2026
b00b12b
test(docs): assert jq-degrade branch skips enrichment (no cwd)
Tombar Jun 2, 2026
062cae7
test(docs): live headless tmux toggle/status/keybinding validation
Tombar Jun 2, 2026
3db7845
test(docs): wezterm lua file-contract + luacheck + sudo-timeout
Tombar Jun 2, 2026
13e4b1d
test(docs): make wezterm luacheck test non-vacuous (skip if runtime b…
Tombar Jun 2, 2026
eae1a64
test(docs): widen wezterm sudo-timeout revert margin for CI
Tombar Jun 2, 2026
dc68c3c
test(docs): iterm OSC roundtrip + python read_mode/compile/import
Tombar Jun 2, 2026
083ead8
docs(iterm): drop unused async_get_app call (pyflakes: app assigned b…
Tombar Jun 2, 2026
4de93dd
ci: separate doc-validation workflow (headless bats suite)
Tombar Jun 2, 2026
a96f208
ci(docs): make tmux keybind + wezterm luacheck checks portable across…
Tombar Jun 2, 2026
cbd6411
test(docs): live WezTerm config-load check, iTerm manual steps, findi…
Tombar Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/doc-validation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: doc-validation
on:
push:
pull_request:

jobs:
validate-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Install harness dependencies
run: |
sudo apt-get update
sudo apt-get install -y bats tmux lua5.4 luarocks jq python3-pip
sudo luarocks install luacheck
# System-wide so imports/binaries are HOME-independent (bats sandboxes HOME).
sudo pip install --break-system-packages iterm2 pyflakes
- name: Build failsafe onto PATH
run: |
mkdir -p "$RUNNER_TEMP/bin"
go build -o "$RUNNER_TEMP/bin/failsafe" ./cmd/failsafe
echo "$RUNNER_TEMP/bin" >> "$GITHUB_PATH"
- name: Run headless doc validation
run: bats --print-output-on-failure test/docs/*.bats
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Doc validation harness.
.PHONY: validate-docs validate-docs-live
validate-docs: ## headless doc validation (CI runs this)
bats test/docs/*.bats
validate-docs-live: ## GUI-launch checks (local only; needs WezTerm/iTerm)
bats test/docs/live-gui/*.bats
2 changes: 0 additions & 2 deletions docs/toggle/iterm.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,6 @@ def read_mode(path):
return "read" # missing file = safe default

async def main(connection):
app = await iterm2.async_get_app(connection)

# `sid_b64` is the base64 of $ITERM_SESSION_ID, published by the shell hook in step 1.
# The trailing '?' makes the reference optional (None if a session has no hook yet).
@iterm2.RPC
Expand Down
81 changes: 81 additions & 0 deletions test/docs/REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Doc validation findings — 2026-06-02

Validates the instructions in `docs/toggle/wezterm.md`, `docs/toggle/iterm.md`,
`docs/toggle/tmux.md`, and `docs/claude-statusline.md` against the shipped `failsafe`
binary by running the **literal fenced code blocks extracted from the docs**.

- **Headless suite** (`make validate-docs`, 34 tests): green in CI
(`.github/workflows/doc-validation.yml`, ubuntu-latest) and locally.
- **Live GUI pass** (`make validate-docs-live`): real WezTerm.
- Binary: `failsafe 0.0.0-dev` (local) / built from source in CI.
- Local env: Darwin arm64 · tmux 3.6b · WezTerm 20240203 · Lua 5.5 (brew) · CI uses
Ubuntu tmux + lua5.4.

## Per-claim results

| Doc | Claim | Method | Result |
|---|---|---|---|
| cross-cutting | chain order WEZTERM→TMUX→ITERM | black-box `mode get` w/ competing files | **PASS** |
| cross-cutting | missing file ⇒ `read` | black-box | **PASS** |
| cross-cutting | rego matches `"read"` only | grep `internal/embed/policies/*.rego` | **PASS** |
| cross-cutting | rw/ro aliases ⇒ canonical bytes | `mode set` + read back | **PASS** |
| cross-cutting | `mode get` tab-delimited (`cut -f1`) | black-box | **PASS** |
| statusline | `🔒 read` / `🔓 write` glyphs | pipe JSON to `examples/claude-statusline.sh` | **PASS** |
| statusline | jq adds `~`-cwd + model | with jq | **PASS** |
| statusline | degrades w/o jq (guard only, no cwd) | nojq PATH shim | **PASS** |
| statusline | single-line output | byte check | **PASS** |
| tmux | toggle script flips read↔read&write | run extracted script (via `run-shell`) | **PASS** |
| tmux | `#{pane_id}` == `$TMUX_PANE` | live headless tmux session | **PASS** |
| tmux | `C-M-t` bound to toggle w/ `#{pane_id}` | `list-keys -T root` (registration) | **PASS** |
| tmux | status script colors (sudo/amber, read/green) | run extracted script | **PASS** |
| tmux | no-script `failsafe toggle` toggles target pane | black-box | **PASS** |
| wezterm | snippet is valid lua | `lua` loadfile | **PASS** |
| wezterm | snippet has no luacheck errors | luacheck (runs in CI; skipped local, see Notes) | **PASS** (CI) |
| wezterm | snippet's own `toggle_mode` writes canonical; failsafe agrees | lua stub + driver fires `keys[1].action` | **PASS** |
| wezterm | badge maps rw→`⚡ sudo`, read→`r` | exact-grep doc line | **PASS** |
| wezterm | sudo-timeout auto-revert mechanism | text-ties to doc + ported 1s revert | **PASS** |
| wezterm | toast / `format-tab-title` rendering | — | **STATIC** (GUI-only) |
| wezterm | config loads + `Ctrl+Alt+t` registered | `wezterm show-keys` (live, real WezTerm) | **PASS (LIVE)** |
| iterm | shell hook OSC-1337 base64 roundtrip | run extracted hook + `base64 -d` | **PASS** |
| iterm | doc's own `read_mode` canonical/default | exec the AST `FunctionDef` | **PASS** |
| iterm | script `py_compile`s + `import iterm2` | python | **PASS** |
| iterm | script passes pyflakes | `python3 -m pyflakes` | **PASS** (after fix below) |
| iterm | no-python `failsafe toggle` flips session file | black-box | **PASS** |
| iterm | Python runtime registration + keypress | manual (`live-gui/iterm-register.md`) | **LIVE-MANUAL** |

## Doc bugs found

- **`docs/toggle/iterm.md` — unused `app` (FIXED).** pyflakes flagged
`local variable 'app' is assigned to but never used`: the Python toggle called
`app = await iterm2.async_get_app(connection)` but registered its RPC off `connection`
and never used `app`. Dead API call — **removed** (commit `083ead8`). The harness's
pyflakes check now guards against regression.

## Benign findings (no change made)

- **`docs/toggle/wezterm.md` — `local act = wezterm.action` is unused.** luacheck reports
it as a *warning* (not an error). It's the conventional WezTerm boilerplate alias that
WezTerm's own docs use as an extension point, so it is intentionally retained. The
luacheck test gates on **errors only** (0 errors), so this warning does not fail
validation — correctness is enforced, style is not.

## Notes / portability

- **luacheck local skip.** Homebrew installs Lua 5.5; luacheck 1.2.0 crashes under it
(`attempt to assign to const variable`). The wezterm luacheck test therefore *skips*
locally (honestly, with a reason — never a vacuous pass) and runs for real in CI on
lua5.4, where it passes. Verified locally against a separately-built Lua 5.4 luacheck.
- **tmux keybinding can't be fired headlessly.** `send-keys` injects into the pane app
and bypasses tmux's root key table, so a `bind -n` can't be triggered from a script.
The test validates *registration* (`list-keys`) + direct script execution instead —
not a synthesized keypress.
- **tmux modifier-order portability.** `list-keys` renders the key as `C-M-t` (tmux 3.6)
or `M-C-t` (older tmux); the test accepts either and asserts the stable
toggle-path + `#{pane_id}` first.
- **`tmux-toggle.sh` outside tmux.** Under `set -euo pipefail` its trailing
`tmux display-message` exits non-zero when run from a plain shell (after the mode file
is already written). In documented use it's always invoked from a tmux keybinding, so
this is benign — but a user running the script by hand from a non-tmux shell will see a
spurious non-zero exit. Minor robustness note, not a bug.
- **GUI-only surfaces are STATIC/LIVE-MANUAL, never dressed as automated:** WezTerm
toasts + `format-tab-title` rendering, and iTerm2's Python-runtime keybinding.
51 changes: 51 additions & 0 deletions test/docs/crosscutting.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
load helpers

setup() { setup_sandbox; need failsafe; }
teardown() { teardown_sandbox; }

# tmux.md states the chain order WEZTERM_PANE -> TMUX_PANE -> ITERM_SESSION_ID.
@test "chain order: WEZTERM_PANE wins over TMUX_PANE (black-box)" {
write_mode_file "%w" "read & write"
write_mode_file "%t" "read"
WEZTERM_PANE="%w" TMUX_PANE="%t" run failsafe mode get
[ "$status" -eq 0 ]
[[ "$output" == "read & write"* ]]
[[ "$output" == *"/pane-mode/%w"* ]]
}

# All four docs: "missing file = read (safe default)".
@test "missing mode file resolves to read" {
WEZTERM_PANE="%none" run failsafe mode get
[[ "$output" == read* ]]
}

# All four docs: the canonical value is what the bundled Rego policies match.
@test "every rego mode comparison is exactly input.mode == \"read\"" {
run grep -rhoE 'input\.mode == "[^"]*"' "$DOCS_REPO_ROOT"/internal/embed/policies/*.rego
[ "$status" -eq 0 ]
while IFS= read -r line; do
[ "$line" = 'input.mode == "read"' ]
done <<< "$output"
}

# Docs claim rw/ro aliases normalize to the canonical bytes written to the file.
@test "mode set aliases write canonical bytes" {
for a in rw w "read & write"; do
write_mode_file "%a" "read"
WEZTERM_PANE="%a" failsafe mode set "$a"
[ "$(read_mode_file "%a")" = "read & write" ]
done
for a in ro r read; do
write_mode_file "%a" "read & write"
WEZTERM_PANE="%a" failsafe mode set "$a"
[ "$(read_mode_file "%a")" = "read" ]
done
}

# claude-statusline.sh relies on `failsafe mode get | cut -f1` — assert the value
# is the first tab-delimited field.
@test "mode get output is tab-delimited (value in field 1)" {
write_mode_file "%w" "read & write"
WEZTERM_PANE="%w" run bash -c 'failsafe mode get | cut -f1'
[ "$output" = "read & write" ]
}
24 changes: 24 additions & 0 deletions test/docs/extract.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
load helpers

EX() { bash "$DOCS_REPO_ROOT/test/docs/lib/extract.sh" "$@"; }

@test "extract.sh returns the first bash block of tmux.md (the toggle script)" {
run EX "$DOCS_REPO_ROOT/docs/toggle/tmux.md" bash 1 "The toggle helper"
[ "$status" -eq 0 ]
[[ "$output" == *"#!/usr/bin/env bash"* ]]
[[ "$output" == *'printf '"'"'%s'"'"' "$next" > "$file"'* ]]
[[ "$output" != *"status indicator"* ]]
}

@test "extract.sh anchors to a heading so ordinals are stable" {
run EX "$DOCS_REPO_ROOT/docs/toggle/tmux.md" bash 1 "Status indicator"
[ "$status" -eq 0 ]
[[ "$output" == *"🔓 sudo"* ]]
[[ "$output" == *"🔒 read"* ]]
}

@test "extract.sh without anchor counts globally" {
run EX "$DOCS_REPO_ROOT/docs/claude-statusline.md" json 1
[ "$status" -eq 0 ]
[[ "$output" == *'"statusLine"'* ]]
}
53 changes: 53 additions & 0 deletions test/docs/helpers.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# shellcheck shell=bash
# Shared helpers for the doc-validation bats suite.

# Repo root (this file lives at test/docs/helpers.bash).
DOCS_REPO_ROOT="${DOCS_REPO_ROOT:-$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)}"

# Snapshot the real HOME before any test sandboxes it (needed for python user-site imports).
ORIG_HOME="${ORIG_HOME:-$HOME}"

# Resolve interpreters once (names vary across distros/brew).
LUA_BIN="$(command -v lua || command -v lua5.4 || command -v luajit || true)"
PYTHON_BIN="$(command -v python3 || command -v python || true)"

EXTRACT_SH="$(dirname "${BASH_SOURCE[0]}")/lib/extract.sh"
STUB_DIR="$(dirname "${BASH_SOURCE[0]}")/stubs"

extract_block() { bash "$EXTRACT_SH" "$@"; }

setup_sandbox() {
TEST_HOME="$(mktemp -d "${TMPDIR:-/tmp}/failsafe-doctest.XXXXXX")"
export HOME="$TEST_HOME"
mkdir -p "$HOME/.claude/pane-mode" "$HOME/.config/failsafe"
unset WEZTERM_PANE TMUX_PANE ITERM_SESSION_ID KITTY_WINDOW_ID CLAUDE_SESSION_ID FAILSAFE_MODE
}

# Only ever removes dirs created by setup_sandbox (our own template prefix), so a
# stray or externally-set TEST_HOME can never be rm -rf'd.
teardown_sandbox() {
case "${TEST_HOME:-}" in
*/failsafe-doctest.*) rm -rf "$TEST_HOME" ;;
*) : ;;
esac
return 0
}

# Skip (not fail) when a required tool is missing — CI installs everything, so a
# skip in CI is itself a signal.
need() { command -v "$1" >/dev/null 2>&1 || skip "$1 not installed"; }

write_mode_file() { printf '%s' "$2" > "$HOME/.claude/pane-mode/$1"; }
read_mode_file() { tr -d '\r\n' < "$HOME/.claude/pane-mode/$1"; }

# Build a PATH dir that contains everything claude-statusline.sh needs EXCEPT jq,
# to exercise the documented graceful-degrade branch.
make_nojq_path() {
local bin="$TEST_HOME/nojq-bin"; mkdir -p "$bin"
local t
for t in env bash sh cat cut sed failsafe; do
local p; p="$(command -v "$t" || true)"
[ -n "$p" ] && ln -sf "$p" "$bin/$t"
done
printf '%s' "$bin"
}
26 changes: 26 additions & 0 deletions test/docs/helpers.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
load helpers

setup() { setup_sandbox; }
teardown() { teardown_sandbox; }

@test "setup_sandbox isolates HOME and clears pane vars" {
[ "$HOME" != "$ORIG_HOME" ]
[ -d "$HOME/.claude/pane-mode" ]
[ -z "${WEZTERM_PANE:-}" ] && [ -z "${TMUX_PANE:-}" ]
}

@test "mode-file helpers round-trip" {
write_mode_file "%5" "read & write"
[ "$(read_mode_file "%5")" = "read & write" ]
}

@test "need skips when a tool is absent" {
need definitely-not-a-real-binary-xyz
false # unreachable: skip above aborts the test
}

@test "lua resolver finds an interpreter or skips" {
if [ -z "$LUA_BIN" ]; then skip "no lua interpreter"; fi
run "$LUA_BIN" -e 'print("ok")'
[ "$output" = "ok" ]
}
60 changes: 60 additions & 0 deletions test/docs/iterm.bats
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
load helpers

IT="$DOCS_REPO_ROOT/docs/toggle/iterm.md"

setup() { setup_sandbox; need failsafe; }
teardown() { teardown_sandbox; }

@test "shell hook emits OSC 1337 SetUserVar with base64 session id" {
local hook sid out b64
hook="$(extract_block "$IT" sh 1)"
sid="w1t6p0:DEAD-BEEF"
out="$(ITERM_SESSION_ID="$sid" bash -c "$hook")"
[[ "$out" == *"1337;SetUserVar=failsafe_sid="* ]]
b64="${out#*failsafe_sid=}"; b64="${b64%$'\a'}"
[ "$(printf '%s' "$b64" | base64 -d)" = "$sid" ]
}

@test "doc python read_mode returns canonical and defaults to read" {
[ -n "$PYTHON_BIN" ] || skip "no python"
extract_block "$IT" python 1 > "$TEST_HOME/it.py"
run "$PYTHON_BIN" - "$TEST_HOME/it.py" <<'PY'
import ast, sys, os, tempfile
src = open(sys.argv[1]).read()
mod = ast.parse(src)
fn = next(n for n in mod.body
if isinstance(n, ast.FunctionDef) and n.name == "read_mode")
ns = {}
exec(compile(ast.Module([fn], []), "read_mode", "exec"), ns)
read_mode = ns["read_mode"]
d = tempfile.mkdtemp()
assert read_mode(os.path.join(d, "missing")) == "read", "missing file -> read"
p = os.path.join(d, "m"); open(p, "w").write("read & write\n")
assert read_mode(p) == "read & write", "canonical preserved"
print("ok")
PY
[ "$status" -eq 0 ]
[[ "$output" == *"ok"* ]]
}

@test "doc python script compiles and iterm2 imports" {
[ -n "$PYTHON_BIN" ] || skip "no python"
HOME="$ORIG_HOME" "$PYTHON_BIN" -c 'import iterm2' 2>/dev/null || skip "iterm2 not installed"
extract_block "$IT" python 1 > "$TEST_HOME/it.py"
run env HOME="$ORIG_HOME" "$PYTHON_BIN" -m py_compile "$TEST_HOME/it.py"
[ "$status" -eq 0 ]
}

@test "doc python script passes pyflakes" {
[ -n "$PYTHON_BIN" ] || skip "no python"
HOME="$ORIG_HOME" "$PYTHON_BIN" -c 'import pyflakes' 2>/dev/null || skip "pyflakes not installed"
extract_block "$IT" python 1 > "$TEST_HOME/it.py"
run env HOME="$ORIG_HOME" "$PYTHON_BIN" -m pyflakes "$TEST_HOME/it.py"
[ "$status" -eq 0 ]
}

@test "no-python alternative: failsafe toggle flips the session file" {
local sid="w1t6p0:GUID-XYZ"
ITERM_SESSION_ID="$sid" failsafe toggle
[ "$(read_mode_file "$sid")" = "read & write" ]
}
27 changes: 27 additions & 0 deletions test/docs/lib/extract.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
# extract.sh <file.md> <lang> <n> [heading_substr]
# Print the Nth (1-based) ```<lang> fenced block. If heading_substr is given,
# only blocks appearing after the most recent heading line containing that
# substring are counted (so edits elsewhere in the doc don't shift the ordinal).
# Fence detection trims surrounding whitespace (handles fences indented inside
# markdown lists); grabbed content is printed verbatim.
set -euo pipefail
file="${1:?usage: extract.sh <file.md> <lang> <n> [heading_substr]}"
lang="${2:?lang}"
want="${3:?n}"
anchor="${4:-}"

awk -v lang="$lang" -v want="$want" -v anchor="$anchor" '
BEGIN { armed = (anchor == "") ? 1 : 0; count = 0; grab = 0; open = "```" lang }
{
t = $0
sub(/^[[:space:]]+/, "", t)
sub(/[[:space:]]+$/, "", t)
}
# A matching heading (re)arms and resets the counter: ordinals are relative to
# the MOST RECENT matching heading.
anchor != "" && /^#/ && index($0, anchor) > 0 { armed = 1; count = 0; grab = 0 }
armed && t == open { if (grab == 0) { count++; if (count == want) { grab = 1; next } } }
grab && t == "```" { exit }
grab { print }
' "$file"
20 changes: 20 additions & 0 deletions test/docs/live-gui/iterm-register.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# iTerm2 runtime registration — manual live check

iTerm2's Python toggle binds a key to `Invoke Script Function` → `failsafe_toggle()`.
That GUI registration + keypress path cannot be scripted reliably, so it is verified by
hand. The automated proxies in `../iterm.bats` already cover the rest: the OSC base64
roundtrip, the doc's own `read_mode`, `py_compile`, `import iterm2`, and pyflakes.

## Steps

1. Add the step-1 shell hook to `~/.zshrc` (or `~/.bashrc`); open a new tab.
2. iTerm2 → **Scripts → Manage → Install Python Runtime**.
3. Save the doc's `failsafe_toggle.py` to
`~/Library/Application Support/iTerm2/Scripts/AutoLaunch/`; launch it via
**Scripts → failsafe_toggle.py**. Confirm **Scripts → Console** shows no error.
4. iTerm2 → **Settings → Keys → Key Bindings → +**: shortcut `Ctrl+Opt+T`,
action **Invoke Script Function**, function `failsafe_toggle()`.
5. Press the key at a shell prompt, then run `failsafe mode get` and confirm the mode
flipped (and the notification, if enabled, appeared).

Record the result (PASS/FAIL) and the iTerm2 version in `../REPORT.md`.
Loading
Loading