Summary
ALL agents fail BP/Predefined and BP/Create Interactive with gpt-5.4-mini on Ubuntu. The model produces output and writes the file, but doesn't go through the babysitter SDK run/iterate lifecycle.
Verification Report (codex + gpt-5.4-mini BP/Predefined)
- model-response: PASS (558K chars)
- file-creation: PASS (3487 bytes)
- stop-hooks: PASS
- hooks-adapter-session: PASS
- babysitter-run-completion: FAIL — no .a5c/runs/ directory
- babysitter-completion-proof: FAIL — no .a5c/runs/ directory
Root Cause
gpt-5.4-mini ignores the $babysitter:yolo command prefix and does the task directly instead of invoking the babysitter plugin. The same agents pass with gpt-5.5 and gemini-3.1-pro-preview, confirming this is a model capability issue.
Affected Cells
12 cells: BP/Predefined Interactive (6 agents), BP/Predefined BH (5 agents), BP/Create Interactive (2 agents) — all Ubuntu, all gpt-5.4-mini.
Possible Fixes
- Stronger babysitter invocation prompt for smaller models
- Force babysitter invocation via a pre-hook that wraps the prompt
- Mark gpt-5.4-mini as unsupported for BP mode
Summary
ALL agents fail BP/Predefined and BP/Create Interactive with gpt-5.4-mini on Ubuntu. The model produces output and writes the file, but doesn't go through the babysitter SDK run/iterate lifecycle.
Verification Report (codex + gpt-5.4-mini BP/Predefined)
Root Cause
gpt-5.4-mini ignores the
$babysitter:yolocommand prefix and does the task directly instead of invoking the babysitter plugin. The same agents pass with gpt-5.5 and gemini-3.1-pro-preview, confirming this is a model capability issue.Affected Cells
12 cells: BP/Predefined Interactive (6 agents), BP/Predefined BH (5 agents), BP/Create Interactive (2 agents) — all Ubuntu, all gpt-5.4-mini.
Possible Fixes