Skip to content

v0.8.61: Add pre-release smoke harness for exec streams, model routes, and shell job UX #3223

@Hmbown

Description

@Hmbown

Why

During v0.8.60 live testing, the TUI still exposed several release-smoke failures that should be caught automatically before tagging/publishing:

  • Shell/background work can still present as a blocking wait Bash / timeout-shaped UI state instead of a clear foreground-vs-background job model.
  • codewhale exec --output-format stream-json can emit terminal OSC/title/progress escape sequences before the first JSON object, making the stream not clean machine-readable JSONL.
  • codewhale exec flag order appears significant: codewhale exec --provider zai --model GLM-5.2 --output-format stream-json ... returned plain text, while codewhale exec --output-format stream-json --provider zai --model GLM-5.2 ... emitted stream-json.
  • Earlier live testing also saw a successful codewhale exec route exit 0 with no visible final text, which belongs in the same smoke family as reasoning-only / empty-output turns.

This complements #3211, #3212, #3213, and #3166. Those issues define the runtime behavior; this issue makes sure the release process catches regressions across installed binaries and configured model routes.

Evidence from local v0.8.60 binary

Installed binaries under PATH:

codewhale 0.8.60 (652d3925ae05)
codew 0.8.60 (652d3925ae05)
codewhale-tui 0.8.60 (652d3925ae05)

Plain one-shot smoke succeeded across routes:

default_text          rc=0 out=CW_SMOKE_OK
deepseek_flash_text   rc=0 out=CW_SMOKE_OK
zai_glm52_text        rc=0 out=CW_SMOKE_OK
minimax_m3_text       rc=0 out=CW_SMOKE_OK

Stream-json route/order problem:

codewhale exec --provider zai --model GLM-5.2 --reasoning-effort high --output-format stream-json ...
# rc=0, but output was plain text: CW_SMOKE_OK

codewhale exec --output-format stream-json --provider zai --model GLM-5.2 --reasoning-effort high ...
# rc=0, emitted stream-json

Stream-json cleanliness problem:

first output bytes included terminal OSC/title/progress sequences before JSON:
\033]9;4;1\a\033]0;CodeWhale\a...{"type":"content","content":"CW"}

Desired smoke harness

Add a bounded, automatable pre-release smoke script/check that can run against the installed binary and/or target/release binary without publishing anything.

It should cover:

  1. Version surface

    • codewhale --version, codew --version, codewhale-tui --version all match the intended release version and git SHA.
  2. Exec text route matrix

    • Default route and selected configured providers return non-empty output for a tiny prompt.
    • Include at least DeepSeek Flash, the configured default route, and any authenticated non-DeepSeek routes discovered locally.
    • Hard timeout per route; failures include provider/model/exit/status/stdout/stderr preview.
  3. Exec stream-json route matrix

    • --output-format stream-json returns valid JSONL for default and explicit provider/model routes.
    • Every non-empty stdout line must start with { and parse as JSON.
    • No terminal title/progress/OSC sequences on stdout in machine modes.
    • Final metadata/done status must be present and must not be hidden behind reasoning-only output.
    • Flag order must not change output format semantics.
  4. Empty/reasoning-only turn guard

    • A route that exits 0 with no final content is a smoke failure.
    • Reasoning-only/no-action responses should be classified and surfaced as recoverable runtime errors, not success.
  5. Shell job UX smoke

    • A long shell command can be launched as background/detached without blocking the main turn.
    • Foreground shell remains a deliberate barrier only when the next step needs the output.
    • The UI/status surface should say running, background, waiting for output, timed out, or cancelled clearly; do not display a confusing timeout/wait Bash state as the main bottom-line UX.
    • Cancelling a background job frees the turn and updates sidebar/job state.

Suggested implementation seams

  • Add a release smoke script under scripts/release/ or scripts/smoke/ that takes --binary <path> and --version <version>.
  • Keep live-provider probes opt-in or auto-discovered from configured/authenticated routes; mock-provider tests should cover CI deterministically.
  • Add unit/integration tests around codewhale exec argument forwarding so output-format/provider/model flag order is stable.
  • Ensure stream-json disables terminal title/progress/OSC writes on stdout; if terminal signals are needed, send them only to an interactive TTY or stderr.
  • Wire the smoke script into the pre-release verification checklist, not the publish step.

Acceptance criteria

  • The smoke harness fails if stream-json stdout contains non-JSON bytes before the first JSON object.
  • The smoke harness fails if --output-format stream-json returns plain text for any accepted flag order.
  • The smoke harness fails on rc=0 + empty output.
  • The smoke harness reports provider/model route names for every failure.
  • The shell-job smoke proves background work does not freeze the TUI/turn loop and that timeout/cancel states are human-readable.
  • The release checklist runs this before tag/publish approval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnessv0.8.61Targeted for CodeWhale v0.8.61workflow-runtimeWorkflow IR, executor, control flow, and replay runtime

    Projects

    Status
    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions