feat(server): soft-close thinking termination via logit-ratio peek by easel · Pull Request #326 · Luce-Org/lucebox-hub

easel · 2026-06-01T03:02:04Z

Summary

Adds an operator-configurable soft-close dial that lets the AR
decode loop terminate </think> early when the close-token logit comes
within a configured probability ratio of the most-likely-token logit.
Default 0.0 keeps current behaviour byte-identical; any positive
value activates the third close_kind="soft" taxonomy entry that
docs/specs/thinking-budget.md §7 has reserved since the v2 work.

Motivation. Gemma 4 26B decodes at ~30 tok/s through up to 15 488
phase-1 thinking tokens (~8 min wall-clock / case) before the hard-cap
hook fires. Spot-checks of close-token logits in the late phase-1
window show </think> riding at 10-60 % of the chosen-token
probability for thousands of steps — i.e. the model is near ready
to close. A soft-ratio dial in the [0.05, 0.5] range can reclaim
30-50 % of those tokens at no quality loss. Sweep methodology and
recommended dial values land in a follow-up PR (out of scope here).

What's in the PR

Plan doc (commit 1) at
docs/experiments/soft-close-thinking-termination-plan.md, with a
verbatim codex review and per-finding dispositions.
Implementation (commit 2):
- BudgetHook::soft_close_min_ratio + GenerateResult::soft_forced_close.
- Soft-close comparator + state machine in qwen35_backend.cpp::do_ar_decode.
- CLI flag --think-soft-close-min-ratio <F> + startup banner line.
- Per-request override via Anthropic envelope's
  thinking.soft_close_min_ratio, clamped to the server ceiling.
- close_kind="soft" emitted in finish_details when the soft path
  fired (precedence: soft > hard > natural on tie — see plan §12).
- Spec doc updated.

Mechanism — zero-cost when disabled

The AR loop already materialises the full logits row to CPU each step
for the sampler. The comparator reads two scalars from that buffer
and runs logit[close0] - logit[chosen] >= log(min_ratio) — no graph
modification, no extra GPU work. When min_ratio == 0 the outer
guard short-circuits before any work happens. Generation determinism
is byte-identical to pre-PR with the dial off.

For the math: prob[i]/prob[j] = exp(logit[i] - logit[j]), so
comparing logit_diff >= log(min_ratio) is identical to comparing
prob_ratio >= min_ratio but skips the softmax. Numerically stable
in fp32 for typical LLM logit ranges (codex confirmed §3.4).

Scope (v1)

Qwen3.5/3.6 only. Gemma 4 and Laguna's AR loops follow the same
pattern (full logits already on CPU per step) but get their own PRs
to keep the diff reviewable.
Pure AR. Spec-decode's verify/accept inner loop reads only the
argmax-of-target — soft-peek there would require graph extension.
Spec-decode tails off into AR before the budget edge, so the soft
trigger still fires correctly on the AR tail.
Single-token close peek. Multi-token </think> sequences (Laguna)
peek only the first id; the existing inject machinery handles the
rest. Codex agreed this is the right engineering trade-off (review
Q3).

Codex review

Sent to the live Gemma 4 26B service via lucebox codex; verbatim
review and per-finding dispositions are recorded in plan §11. Codex
verdict: PROCEED WITH CHANGES. The single critical finding (Q5,
per-request clamp logic broken when server_default=0) is fixed: the
operator-disabled-server case now silently ignores per-request opt-in
attempts rather than enabling them via the clamp loophole.

Tests

server/test/test_server_unit.cpp gains 12 new test functions
(~135 new assertions). The comparator is small and inline in
model_backend.h::soft_close::should_fire, called from a unit-test
mirror of the AR loop's close-state machine. No GPU required.

Standalone smoke-test of the comparator math (uncommitted) confirmed
135/135 assertions pass before committing.

Test plan

Unit tests pass in a fresh build: cmake --build server/build --target test_server_unit && ctest -R server_unit
Existing thinking-budget integration tests in luce-bench/tests/test_client_thinking_budget.py pass unchanged (default dial=0 → behaviour byte-identical).
Operator sets --think-soft-close-min-ratio 0.1 in a smoke deploy, confirms server banner shows the new line, runs a Qwen3.6 thinking-enabled probe, and verifies a finish_details.close_kind="soft" appears for at least one case.
Empirical sweep (follow-up PR) quantifies token savings vs quality across a sweep bracket of min_ratio ∈ {0.05, 0.1, 0.2, 0.5} on the existing coding-agent-loop probes.

Follow-ups (NOT in this PR)

Gemma 4 26B soft-close port (same mechanism, separate backend file).
Laguna soft-close port (separate backend file).
dflash.think_soft_close_min_ratio knob in the lucebox python CLI repo + autotune sweep bracket entry.
Empirical sweep + recommended dial values per model.
Spec-decode joint-peek (if Laguna multi-token false-positives warrant it).

🤖 Generated with Claude Code

…io peek Settle the design for a configurable soft-close dial that lets the AR loop terminate `</think>` early once its close-token logit comes within a configurable probability ratio of the chosen-token logit. Default disabled (zero cost when off); operator opt-in via `--think-soft-close-min-ratio`; per-request override clamps to the server ceiling like other thinking knobs. Key design choices documented: - Reuse the existing per-step CPU logits read (no graph addition). - Compare via `logit_diff >= log(min_ratio)` — no softmax required. - Multi-token close peeks first id only; existing inject machinery drives the rest of the sequence. - Soft wins ties against hard on same-step trigger (rebuttal in §12). - Spec-decode boundary unchanged — pure-AR only in v1. Next steps: codex review (§11 placeholder), implementation, tests.

Add an operator-configurable dial (`--think-soft-close-min-ratio`) that lets the AR loop terminate `</think>` early when its close-token logit comes within a configured probability ratio of the chosen-token logit. Default `0.0` (disabled) is byte-identical to pre-change behaviour. Mechanism (Qwen3.5/3.6 AR loop only in v1): - Comparator runs after sampling, before the existing hard-cap hook, using the logits row that's already on CPU for the sampler — no graph addition, no extra GPU work. - Threshold check uses `logit[close] - logit[chosen] >= log(min_ratio)`, which is mathematically equivalent to a probability-ratio compare but avoids softmax / exp() cost. - Per-request override (`thinking.soft_close_min_ratio`) clamps to `min(requested, server_default)`; ignored entirely when the operator has the dial at 0 (codex review Q5 fix). - Multi-token close peeks first id only; existing inject machinery drives the remaining ids. - New `close_kind="soft"` value in `finish_details`; spec §7 updated. Soft wins ties against hard on the same step (plan §4 + §12). Plumbing: - `BudgetHook::soft_close_min_ratio` (model_backend.h). - `GenerateResult::soft_forced_close`. - `ServerConfig::soft_close_min_ratio` + `--think-soft-close-min-ratio` CLI flag + startup banner line. - `ParsedRequest::per_req_soft_close_min_ratio` parsed from `thinking.soft_close_min_ratio`. - `do_ar_decode` / `do_spec_decode` signatures extended with a `soft_forced_close_out` pointer; existing hard-cap path untouched. Tests (12 new, 17 RUN_TEST invocations adding ~135 assertions): - Comparator math: disabled/strict/aggressive/below-threshold/ chosen-is-close/tiny-ratio edge cases. - State machine: single-token + multi-token inject, soft-preempts-hard, disabled-hard-still-fires, natural-at-boundary, byte-identical determinism when disabled. Spec-decode boundary documented as v1 limitation (out of scope). Gemma4 + Laguna soft-close are follow-ups; lucebox python config and autotune sweep brackets land in the lucebox CLI repo. See docs/experiments/soft-close-thinking-termination-plan.md for the full design (with verbatim codex review + dispositions).

cubic-dev-ai

2 issues found across 9 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/experiments/soft-close-thinking-termination-plan.md">

<violation number="1" location="docs/experiments/soft-close-thinking-termination-plan.md:103">
P3: Plan document contradicts itself on whether `log(min_ratio)` is precomputed once outside the AR loop or computed each step. §3.1's code snippet computes `std::log(budget_hook.soft_close_min_ratio)` inside the loop's if-block (every step the comparator runs), but the text immediately after says it is 'precomputed once outside the loop' and §3.6 repeats 'precomputed once at AR entry'. The actual implementation in `soft_close::should_fire` (model_backend.h:108) also computes `std::log(min_ratio)` on each call rather than caching it. A reader trying to implement from the plan would get contradictory guidance about where to place the `log()` call.</violation>
</file>

<file name="server/src/qwen35/qwen35_backend.cpp">

<violation number="1" location="server/src/qwen35/qwen35_backend.cpp:983">
P1: Soft-close skips the first token of multi-token close sequences because `maybe_force_close` immediately overwrites `close[0]` with `close[1]` in the same step.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-01T03:05:54Z

+            tok, close0, budget_hook.close_token_ids.size());
+        tok = close0;
+        budget_close_started = true;
+        close_inject_pos = 1;


P1: Soft-close skips the first token of multi-token close sequences because maybe_force_close immediately overwrites close[0] with close[1] in the same step.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At server/src/qwen35/qwen35_backend.cpp, line 983: <comment>Soft-close skips the first token of multi-token close sequences because `maybe_force_close` immediately overwrites `close[0]` with `close[1]` in the same step.</comment> <file context> @@ -938,6 +943,47 @@ bool Qwen35Backend::do_ar_decode(int committed, int n_gen, + tok, close0, budget_hook.close_token_ids.size()); + tok = close0; + budget_close_started = true; + close_inject_pos = 1; + if (soft_forced_close_out) *soft_forced_close_out = true; + }; </file context>

Suggested change

close_inject_pos = 1;

close_inject_pos = 0;

cubic-dev-ai · 2026-06-01T03:05:54Z

+        // prob[close] / prob[chosen] = exp(l_close - l_chosen);
+        // Compare l_close - l_chosen >= log(min_ratio) — single fma,
+        // no exp() needed.
+        const float log_ratio = std::log(budget_hook.soft_close_min_ratio);


P3: Plan document contradicts itself on whether log(min_ratio) is precomputed once outside the AR loop or computed each step. §3.1's code snippet computes std::log(budget_hook.soft_close_min_ratio) inside the loop's if-block (every step the comparator runs), but the text immediately after says it is 'precomputed once outside the loop' and §3.6 repeats 'precomputed once at AR entry'. The actual implementation in soft_close::should_fire (model_backend.h:108) also computes std::log(min_ratio) on each call rather than caching it. A reader trying to implement from the plan would get contradictory guidance about where to place the log() call.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/experiments/soft-close-thinking-termination-plan.md, line 103: <comment>Plan document contradicts itself on whether `log(min_ratio)` is precomputed once outside the AR loop or computed each step. §3.1's code snippet computes `std::log(budget_hook.soft_close_min_ratio)` inside the loop's if-block (every step the comparator runs), but the text immediately after says it is 'precomputed once outside the loop' and §3.6 repeats 'precomputed once at AR entry'. The actual implementation in `soft_close::should_fire` (model_backend.h:108) also computes `std::log(min_ratio)` on each call rather than caching it. A reader trying to implement from the plan would get contradictory guidance about where to place the `log()` call.</comment> <file context> @@ -0,0 +1,774 @@ + // prob[close] / prob[chosen] = exp(l_close - l_chosen); + // Compare l_close - l_chosen >= log(min_ratio) — single fma, + // no exp() needed. + const float log_ratio = std::log(budget_hook.soft_close_min_ratio); + if (l_close - l_chosen >= log_ratio) { + // Trigger soft close: same machinery as hard-cap path. </file context>

Integrate soft-close thinking termination while preserving the existing empty-visible-output retry path, stall guards, MoE AR dispatch path, and C2 gate tests.

Record PR Luce-Org#326 integration, current PR-head coverage, retained conflict probes, and Luce-Org#321 target-shard IPC feasibility findings.

…ebox-docker Brings the soft-close logit-ratio peek mechanism onto feat/lucebox-docker so the cuda12 image can be rebuilt with both the call:<verb>{} parser+ emitter fix (Luce-Org#329) AND the auto-thinking-cap dial available in a single sweep. Folded: - 1552495 docs(experiments): plan soft-close thinking termination - d799d00 feat(server): soft-close thinking termination via logit-ratio peek Conflicts resolved: - server/src/qwen35/qwen35_backend.cpp: do_ar_decode signature kept HEAD's terse comment + soft-close's new bool *soft_forced_close_out parameter. - server/test/test_server_unit.cpp: concatenated HEAD's C2-gate tests with soft-close's comparator/state-machine tests; merged both RUN_TEST blocks. Plumbing added in this merge (not on the source branch): - DFLASH_THINK_SOFT_CLOSE_MIN_RATIO env var in entrypoint.sh, emitted to the server CLI as --think-soft-close-min-ratio only when nonzero (preserves byte-identical-when-disabled invariant). - DflashRuntime.think_soft_close_min_ratio (float, default 0.0) in lucebox types/config/docker_run so `lucebox config set dflash.think_soft_close_min_ratio=0.5` propagates through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ring PR Luce-Org#326 merge Cherry-pick artifact from resolving the conflict in test_server_unit.cpp during the soft-close merge — `sed -i '4155d'` deleted the closing brace of test_soft_close_natural_at_boundary instead of the leftover conflict-marker line. Compile fails with 'a function-definition is not allowed here before `{` token' at the int main() that follows. Restores the brace; no logic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds an operator-only flag that emits one stderr line per AR step inside the thinking phase recording (committed, chosen_tok, close0_tok, logit[close], logit[chosen], diff, prob_ratio). Designed to capture real close-vs-chosen logit trajectories on qwen3.6 so a sliding-ratio soft-close curve can be fit from data rather than guessed. The fixed-ratio soft-close (PR Luce-Org#326) terminates thinking when logit[close]-logit[chosen] >= log(ratio). A single ratio is the wrong tool for both "step 1 reasoning" and "5K-token reasoning" — what we want is a ratio that slides from strict at the start to permissive at the cap. Curve shape (linear / exponential / piecewise) depends on how the logit gap evolves through thinking, which this flag now exposes empirically. Plumbing: - BudgetHook::debug_thinking_logits (model_backend.h) - qwen35_backend.cpp maybe_soft_close lambda: emits [soft-trace] every step when flag set, regardless of soft_close_min_ratio. Also enables the prefill-last-logits read on the first AR token so step 0 participates. - ServerConfig::debug_thinking_logits + --debug-thinking-logits CLI + startup banner line. - http_server.cpp threads config_.debug_thinking_logits into the per-request BudgetHook. - DFLASH_DEBUG_THINKING_LOGITS env in entrypoint.sh (default 0; forwarded to --debug-thinking-logits when "1"). - lucebox: DflashRuntime.debug_thinking_logits (bool, default False) + config.py setter + docker_run.py env emission. Zero GPU cost (logits already on CPU for sampling); ~1 stderr line per thinking token across in-flight requests when on. Off by default. No behavior change when DFLASH_DEBUG_THINKING_LOGITS=0. test_server_unit: 1973 assertions, 0 failures. lucebox tests: 114/114 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…floor Empirical motivation ==================== Soft-close (PR Luce-Org#326 mainline) was effectively inert on qwen3.6-27b. A trajectory probe across 5 diverse prompts (1085-5771 thinking tokens each) showed `prob_ratio < 1e-8` every step — meaning no sampled ratio in {0.1, 0.3, 0.5, 0.7, 0.9} would ever fire. Root cause: `BudgetHook::close_token_ids` was used for both: (a) the peek-token id read by `soft_close::should_fire(..., close0)` (b) the inject sequence written when the hook fires. For qwen3.6-27b the model card's `thinking_terminator_hint` is a 16+ token English directive starting with "Considering the limited time by the user, ...". So `close_token_ids[0]` tokenized to ~79939 ("Considering") — a mid-sentence content token whose logit sits 19-35 nats below the chosen token at every thinking step. Fix (path α): split probe-vs-inject in BudgetHook ================================================== * `close_token_ids` — unchanged role. Full inject sequence written on hard close or when soft-close fires. * `soft_close_probe_ids` — NEW. Short sequence (typically one token) used only for the comparator peek. server_main detects the close marker substring inside the hint and tokenizes it in isolation; on miss it leaves the probe field empty (legacy fallback peek path in force). `BudgetHook::soft_close_probe_token()` returns the probe when set, else falls back to close_token_ids.front(). Validation: re-probed with image built from this branch. `</think>` (token 248069) reliably becomes argmax-competitive at 66-94% of natural reasoning length across all 5 prompts. `max_diff` reaches 0.000 (`prob_ratio = 1.0`) on every prompt vs prior `max_diff = -9.69` on token 79939. 9.7 nat improvement, restoring the mechanism to its designed regime. False-positive guard: min_thinking_tokens floor ================================================ The peek runs every AR step but the fire decision can be gated by a new `BudgetHook::soft_close_min_tokens` (server CLI: `--think-soft-close-min-tokens N`). When set, suppress fire until `committed_now - committed_at_entry >= soft_close_min_tokens`. Protects against a rare early `</think>` logit spike on prompts where the model briefly considers concluding mid-thought. Default 0 = floor disabled (no behavior change from prior). Empirical 66-94% fire window puts typical operating point at floor=128 for qwen3.6-27b. Per-request override not exposed (server-policy gate). Diagnostic: --debug-thinking-logits ==================================== Adds `BudgetHook::debug_thinking_logits` + server CLI flag. When on, emits one stderr line per AR step recording committed, chosen, probe0, logit[close], logit[chosen], diff, prob_ratio. Used to capture full close-vs-chosen trajectories so a sliding-ratio curve can be designed from data rather than guessed. Zero GPU cost (logits already on CPU for sampling); stderr-heavy, operator-only. Tests ===== 5 new unit tests: - test_soft_close_probe_uses_probe_ids_not_inject_ids - test_soft_close_probe_ids_empty_falls_back_to_close_token_ids - test_soft_close_inject_sequence_unchanged_when_fires - test_soft_close_min_tokens_blocks_early_fire - test_soft_close_min_tokens_default_zero_unchanged_behavior Also fixes a pre-existing OOB write in test_soft_close_determinism_when_disabled (vocab=1000 row indexed at 248069). UB-silent in Release before the new tests perturbed heap layout enough to crash; widened to vocab=250000 in place. test_server_unit: 1780 assertions, 0 failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

easel · 2026-06-03T13:03:02Z

Follow-up commit `f7e8d6f8`: probe/inject split + min_thinking_tokens floor

Empirical validation on qwen3.6-27b showed the dial inert at every sampled ratio (prob_ratio < 1e-8 across 12,888 thinking-token steps). Root cause: BudgetHook::close_token_ids was used for both the soft-close peek probe AND the inject sequence — for qwen3.6's trained-hint sidecar that meant peeking the "Considering" lead-in (id 79939) instead of the </think> marker (id 248069).

This commit:

Splits probe-vs-inject in BudgetHook. New soft_close_probe_ids field (empty = legacy fallback). server_main detects the marker substring inside the hint and tokenizes it in isolation.
Adds --think-soft-close-min-tokens N false-positive floor (default 0 = disabled).
Adds --debug-thinking-logits trajectory diagnostic for tuning future curves.

After the fix, re-probe shows </think> reaches argmax (max_diff = 0.000, prob_ratio = 1.0) at 66-94% of natural reasoning length across 5 diverse prompts. 9.7 nat improvement restores the mechanism to its designed regime.

5 new unit tests (test_server_unit: 1780 assertions, 0 failures). PR #331 (which had this fix on a bad base) is closed in favor of consolidation here.

🤖 Generated with Claude Code

…floor Empirical motivation ==================== Soft-close (PR Luce-Org#326 mainline) was effectively inert on qwen3.6-27b. A trajectory probe across 5 diverse prompts (1085-5771 thinking tokens each) showed `prob_ratio < 1e-8` every step — meaning no sampled ratio in {0.1, 0.3, 0.5, 0.7, 0.9} would ever fire. Root cause: `BudgetHook::close_token_ids` was used for both: (a) the peek-token id read by `soft_close::should_fire(..., close0)` (b) the inject sequence written when the hook fires. For qwen3.6-27b the model card's `thinking_terminator_hint` is a 16+ token English directive starting with "Considering the limited time by the user, ...". So `close_token_ids[0]` tokenized to ~79939 ("Considering") — a mid-sentence content token whose logit sits 19-35 nats below the chosen token at every thinking step. Fix (path α): split probe-vs-inject in BudgetHook ================================================== * `close_token_ids` — unchanged role. Full inject sequence written on hard close or when soft-close fires. * `soft_close_probe_ids` — NEW. Short sequence (typically one token) used only for the comparator peek. server_main detects the close marker substring inside the hint and tokenizes it in isolation; on miss it leaves the probe field empty (legacy fallback peek path in force). `BudgetHook::soft_close_probe_token()` returns the probe when set, else falls back to close_token_ids.front(). Validation: re-probed with image built from this branch. `</think>` (token 248069) reliably becomes argmax-competitive at 66-94% of natural reasoning length across all 5 prompts. `max_diff` reaches 0.000 (`prob_ratio = 1.0`) on every prompt vs prior `max_diff = -9.69` on token 79939. 9.7 nat improvement, restoring the mechanism to its designed regime. False-positive guard: min_thinking_tokens floor ================================================ The peek runs every AR step but the fire decision can be gated by a new `BudgetHook::soft_close_min_tokens` (server CLI: `--think-soft-close-min-tokens N`). When set, suppress fire until `committed_now - committed_at_entry >= soft_close_min_tokens`. Protects against a rare early `</think>` logit spike on prompts where the model briefly considers concluding mid-thought. Default 0 = floor disabled (no behavior change from prior). Empirical 66-94% fire window puts typical operating point at floor=128 for qwen3.6-27b. Per-request override not exposed (server-policy gate). Diagnostic: --debug-thinking-logits ==================================== Adds `BudgetHook::debug_thinking_logits` + server CLI flag. When on, emits one stderr line per AR step recording committed, chosen, probe0, logit[close], logit[chosen], diff, prob_ratio. Used to capture full close-vs-chosen trajectories so a sliding-ratio curve can be designed from data rather than guessed. Zero GPU cost (logits already on CPU for sampling); stderr-heavy, operator-only. Tests ===== 5 new unit tests: - test_soft_close_probe_uses_probe_ids_not_inject_ids - test_soft_close_probe_ids_empty_falls_back_to_close_token_ids - test_soft_close_inject_sequence_unchanged_when_fires - test_soft_close_min_tokens_blocks_early_fire - test_soft_close_min_tokens_default_zero_unchanged_behavior Also fixes a pre-existing OOB write in test_soft_close_determinism_when_disabled (vocab=1000 row indexed at 248069). UB-silent in Release before the new tests perturbed heap layout enough to crash; widened to vocab=250000 in place. test_server_unit: 1780 assertions, 0 failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

easel added 2 commits May 31, 2026 22:49

cubic-dev-ai Bot reviewed Jun 1, 2026

View reviewed changes

easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 1, 2026

docs: refresh auto-integration checkpoint

545048f

Record PR Luce-Org#326 integration, current PR-head coverage, retained conflict probes, and Luce-Org#321 target-shard IPC feasibility findings.

easel mentioned this pull request Jun 1, 2026

fix(server): support gemma-4's plain-text call:<verb>{} tool-call format #329

Draft

easel mentioned this pull request Jun 3, 2026

fix(server): split soft-close probe ids from inject ids #331

Closed

4 tasks

easel mentioned this pull request Jun 3, 2026

fix(server): plain-text call:verb spans must survive emit_finish malformed-parse + responses .done easel/lucebox-hub#1

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): soft-close thinking termination via logit-ratio peek#326

feat(server): soft-close thinking termination via logit-ratio peek#326
easel wants to merge 3 commits into
Luce-Org:mainfrom
easel:feat/soft-close-thinking-termination

easel commented Jun 1, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

easel commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

easel commented Jun 1, 2026

Summary

What's in the PR

Mechanism — zero-cost when disabled

Scope (v1)

Codex review

Tests

Test plan

Follow-ups (NOT in this PR)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

easel commented Jun 3, 2026

Follow-up commit f7e8d6f8: probe/inject split + min_thinking_tokens floor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up commit `f7e8d6f8`: probe/inject split + min_thinking_tokens floor