Why
During v0.8.60 live testing, the TUI still exposed several release-smoke failures that should be caught automatically before tagging/publishing:
- Shell/background work can still present as a blocking
wait Bash / timeout-shaped UI state instead of a clear foreground-vs-background job model.
codewhale exec --output-format stream-json can emit terminal OSC/title/progress escape sequences before the first JSON object, making the stream not clean machine-readable JSONL.
codewhale exec flag order appears significant: codewhale exec --provider zai --model GLM-5.2 --output-format stream-json ... returned plain text, while codewhale exec --output-format stream-json --provider zai --model GLM-5.2 ... emitted stream-json.
- Earlier live testing also saw a successful
codewhale exec route exit 0 with no visible final text, which belongs in the same smoke family as reasoning-only / empty-output turns.
This complements #3211, #3212, #3213, and #3166. Those issues define the runtime behavior; this issue makes sure the release process catches regressions across installed binaries and configured model routes.
Evidence from local v0.8.60 binary
Installed binaries under PATH:
codewhale 0.8.60 (652d3925ae05)
codew 0.8.60 (652d3925ae05)
codewhale-tui 0.8.60 (652d3925ae05)
Plain one-shot smoke succeeded across routes:
default_text rc=0 out=CW_SMOKE_OK
deepseek_flash_text rc=0 out=CW_SMOKE_OK
zai_glm52_text rc=0 out=CW_SMOKE_OK
minimax_m3_text rc=0 out=CW_SMOKE_OK
Stream-json route/order problem:
codewhale exec --provider zai --model GLM-5.2 --reasoning-effort high --output-format stream-json ...
# rc=0, but output was plain text: CW_SMOKE_OK
codewhale exec --output-format stream-json --provider zai --model GLM-5.2 --reasoning-effort high ...
# rc=0, emitted stream-json
Stream-json cleanliness problem:
first output bytes included terminal OSC/title/progress sequences before JSON:
\033]9;4;1\a\033]0;CodeWhale\a...{"type":"content","content":"CW"}
Desired smoke harness
Add a bounded, automatable pre-release smoke script/check that can run against the installed binary and/or target/release binary without publishing anything.
It should cover:
-
Version surface
codewhale --version, codew --version, codewhale-tui --version all match the intended release version and git SHA.
-
Exec text route matrix
- Default route and selected configured providers return non-empty output for a tiny prompt.
- Include at least DeepSeek Flash, the configured default route, and any authenticated non-DeepSeek routes discovered locally.
- Hard timeout per route; failures include provider/model/exit/status/stdout/stderr preview.
-
Exec stream-json route matrix
--output-format stream-json returns valid JSONL for default and explicit provider/model routes.
- Every non-empty stdout line must start with
{ and parse as JSON.
- No terminal title/progress/OSC sequences on stdout in machine modes.
- Final metadata/done status must be present and must not be hidden behind reasoning-only output.
- Flag order must not change output format semantics.
-
Empty/reasoning-only turn guard
- A route that exits 0 with no final content is a smoke failure.
- Reasoning-only/no-action responses should be classified and surfaced as recoverable runtime errors, not success.
-
Shell job UX smoke
- A long shell command can be launched as background/detached without blocking the main turn.
- Foreground shell remains a deliberate barrier only when the next step needs the output.
- The UI/status surface should say
running, background, waiting for output, timed out, or cancelled clearly; do not display a confusing timeout/wait Bash state as the main bottom-line UX.
- Cancelling a background job frees the turn and updates sidebar/job state.
Suggested implementation seams
- Add a release smoke script under
scripts/release/ or scripts/smoke/ that takes --binary <path> and --version <version>.
- Keep live-provider probes opt-in or auto-discovered from configured/authenticated routes; mock-provider tests should cover CI deterministically.
- Add unit/integration tests around
codewhale exec argument forwarding so output-format/provider/model flag order is stable.
- Ensure stream-json disables terminal title/progress/OSC writes on stdout; if terminal signals are needed, send them only to an interactive TTY or stderr.
- Wire the smoke script into the pre-release verification checklist, not the publish step.
Acceptance criteria
- The smoke harness fails if stream-json stdout contains non-JSON bytes before the first JSON object.
- The smoke harness fails if
--output-format stream-json returns plain text for any accepted flag order.
- The smoke harness fails on rc=0 + empty output.
- The smoke harness reports provider/model route names for every failure.
- The shell-job smoke proves background work does not freeze the TUI/turn loop and that timeout/cancel states are human-readable.
- The release checklist runs this before tag/publish approval.
Why
During v0.8.60 live testing, the TUI still exposed several release-smoke failures that should be caught automatically before tagging/publishing:
wait Bash/ timeout-shaped UI state instead of a clear foreground-vs-background job model.codewhale exec --output-format stream-jsoncan emit terminal OSC/title/progress escape sequences before the first JSON object, making the stream not clean machine-readable JSONL.codewhale execflag order appears significant:codewhale exec --provider zai --model GLM-5.2 --output-format stream-json ...returned plain text, whilecodewhale exec --output-format stream-json --provider zai --model GLM-5.2 ...emitted stream-json.codewhale execroute exit 0 with no visible final text, which belongs in the same smoke family as reasoning-only / empty-output turns.This complements #3211, #3212, #3213, and #3166. Those issues define the runtime behavior; this issue makes sure the release process catches regressions across installed binaries and configured model routes.
Evidence from local v0.8.60 binary
Installed binaries under PATH:
Plain one-shot smoke succeeded across routes:
Stream-json route/order problem:
Stream-json cleanliness problem:
Desired smoke harness
Add a bounded, automatable pre-release smoke script/check that can run against the installed binary and/or
target/releasebinary without publishing anything.It should cover:
Version surface
codewhale --version,codew --version,codewhale-tui --versionall match the intended release version and git SHA.Exec text route matrix
Exec stream-json route matrix
--output-format stream-jsonreturns valid JSONL for default and explicit provider/model routes.{and parse as JSON.Empty/reasoning-only turn guard
Shell job UX smoke
running,background,waiting for output,timed out, orcancelledclearly; do not display a confusingtimeout/wait Bashstate as the main bottom-line UX.Suggested implementation seams
scripts/release/orscripts/smoke/that takes--binary <path>and--version <version>.codewhale execargument forwarding so output-format/provider/model flag order is stable.Acceptance criteria
--output-format stream-jsonreturns plain text for any accepted flag order.