Summary
After #936 was fully resolved, genty vanilla NI is GREEN on gpt-5.5 across all 3 OSes (Ubuntu, macOS, Windows).
But the other 5 models fail on the file-creation check only (gpt-5.4-mini, claude-sonnet-4-6, gemini-3.5-flash, gemini-3.1-pro-preview, DeepSeek-V4-Pro) — across all 3 OSes (runs 27372409793 / 27372411571 / 27372413000).
The #936 infrastructure all works for these models:
✓ model-response (agent responded, ~12k chars)
✓ proxy-communication
✗ file-creation: agent did not create .a5c-live-test/<id>-odyssey.md (output: 12092 chars)
✓ babysitter-run-completion: run exists with 6 journal events (>=5)
✓ babysitter-completion-proof: completed with processId + completionProof
Diagnosis
The model produces the full odyssey content as agent output text (~12k chars) and the run completes with a valid completion proof — but the content is never written to the expected file .a5c-live-test/<sessionId>-odyssey.md. gpt-5.5 reliably authors a process whose delegated worker writes the file; weaker/other models author/execute a process that returns the content instead of writing it (or write to the wrong path).
This is a model-adherence / prompt-robustness gap, not an orchestration bug. Likely fix: strengthen the authoring + delegated-worker prompts so the file write to the exact target path is mandatory and verified, independent of model strength.
Repro (local, fast)
AZURE_OPENAI_API_KEY + AZURE_OPENAI_PROJECT_NAME → node packages/genty/cli/dist/cli/main.js yolo --prompt "<odyssey...save to .a5c-live-test/x.md>" --model gpt-5.4-mini --no-interactive --workspace <tmp> and check whether the file is created.
Summary
After #936 was fully resolved, genty
vanilla NIis GREEN on gpt-5.5 across all 3 OSes (Ubuntu, macOS, Windows).But the other 5 models fail on the
file-creationcheck only (gpt-5.4-mini, claude-sonnet-4-6, gemini-3.5-flash, gemini-3.1-pro-preview, DeepSeek-V4-Pro) — across all 3 OSes (runs27372409793/27372411571/27372413000).The #936 infrastructure all works for these models:
Diagnosis
The model produces the full odyssey content as agent output text (~12k chars) and the run completes with a valid completion proof — but the content is never written to the expected file
.a5c-live-test/<sessionId>-odyssey.md. gpt-5.5 reliably authors a process whose delegated worker writes the file; weaker/other models author/execute a process that returns the content instead of writing it (or write to the wrong path).This is a model-adherence / prompt-robustness gap, not an orchestration bug. Likely fix: strengthen the authoring + delegated-worker prompts so the file write to the exact target path is mandatory and verified, independent of model strength.
Repro (local, fast)
AZURE_OPENAI_API_KEY+AZURE_OPENAI_PROJECT_NAME→node packages/genty/cli/dist/cli/main.js yolo --prompt "<odyssey...save to .a5c-live-test/x.md>" --model gpt-5.4-mini --no-interactive --workspace <tmp>and check whether the file is created.