Skip to content

BP mode fails with gpt-5.4-mini: model doesn't follow babysitter orchestration instructions #947

Description

@tmuskal

Summary

ALL agents fail BP/Predefined and BP/Create Interactive with gpt-5.4-mini on Ubuntu. The model produces output and writes the file, but doesn't go through the babysitter SDK run/iterate lifecycle.

Verification Report (codex + gpt-5.4-mini BP/Predefined)

  • model-response: PASS (558K chars)
  • file-creation: PASS (3487 bytes)
  • stop-hooks: PASS
  • hooks-adapter-session: PASS
  • babysitter-run-completion: FAIL — no .a5c/runs/ directory
  • babysitter-completion-proof: FAIL — no .a5c/runs/ directory

Root Cause

gpt-5.4-mini ignores the $babysitter:yolo command prefix and does the task directly instead of invoking the babysitter plugin. The same agents pass with gpt-5.5 and gemini-3.1-pro-preview, confirming this is a model capability issue.

Affected Cells

12 cells: BP/Predefined Interactive (6 agents), BP/Predefined BH (5 agents), BP/Create Interactive (2 agents) — all Ubuntu, all gpt-5.4-mini.

Possible Fixes

  1. Stronger babysitter invocation prompt for smaller models
  2. Force babysitter invocation via a pre-hook that wraps the prompt
  3. Mark gpt-5.4-mini as unsupported for BP mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions