Skip to content

feat(pflash): adaptive keep_ratio bandit MVP#264

Merged
davide221 merged 2 commits into
Luce-Org:mainfrom
dusterbloom:feat/pflash-mvp-adaptive-keep
May 27, 2026
Merged

feat(pflash): adaptive keep_ratio bandit MVP#264
davide221 merged 2 commits into
Luce-Org:mainfrom
dusterbloom:feat/pflash-mvp-adaptive-keep

Conversation

@dusterbloom
Copy link
Copy Markdown
Collaborator

@dusterbloom dusterbloom commented May 23, 2026

Summary

Replaces the fixed keep_ratio PFlash knob with a per-session ε-greedy bandit that adjusts compression based on observed accept_rate. Opt-in — clients without session_id get the existing fixed-keep path, byte-identical to main.

Source changes (1,043 LoC, 17 files, all under server/ + harness/ + thoughts/)

File LoC What
server/src/server/adaptive_keep_ratio.h +116 Bandit state machine (step_adaptive_keep_ratio + LRU-bounded HttpServerSessions, cap 1024)
server/src/server/http_server.{cpp,h} +54 Parse session_id, call bandit, inject accept_rate into chat/anthropic/responses usage objects
server/src/qwen35/qwen35_backend.{cpp,h} +13 Plumb accept_rate + spec_decode_ran out of do_spec_decode
server/src/common/model_backend.h +5 accept_rate / spec_decode_ran fields on GenerateResult
server/test/test_adaptive_keep_ratio.cpp +239 State machine: convergence, clamping, EMA, LRU eviction
server/test/test_bandit_integration.cpp +200 HTTP-shaped lifecycle, non-string session_id guards
server/test/test_server_unit.cpp +82 ParsedRequest::session_id + accept_rate usage-object contract
server/CMakeLists.txt +12 Register the 3 new test targets
.github/workflows/ci.yml +1 Submodule PAT fix for fork PRs
harness/clients/run_claude_code.sh +41/-2 Wire PFLASH_SESSION_ID env into request body
harness/clients/session_inject_proxy.py +144 Local proxy that injects extra_body.session_id into requests
harness/clients/prompts/{logic,math}_check.txt +10 Multi-turn test prompts
thoughts/2026-05-21_pflash_mvp_plan.md +129 Design doc (mrciffa-scope MVP plan)

All bandit logic lives in the C++ HTTP server. The Python server is deprecated and untouched.

Behavior matrix

Request main this PR
no session_id fixed keep_ratio identical
extra_body.session_id="…" unknown field, ignored per-session bandit keep_ratio
pflash_mode=off no compression no compression (bandit not consulted)

Evidence

5-turn adaptive trajectory (keep moves 0.10 → 0.15 as bandit responds to 6.2% accept), NIAH @ 16K/32K at 5/5 with ~1.9× faster prefill at keep=0.10, and Day-5 A/B/C (claude_code single-shot) showing the bandit beating both fixed arms (16s wall, 31.9% accept, OK_DONE). Bench scripts + result artifacts moved to a follow-up PR to keep this one focused on the C++ feature.

Open questions for reviewers

  1. extra_body.session_id vs top-level session_id — kept both, prefer one?
  2. kBanditStepLarge = 0.01 — too slow for bad priors?
  3. EMA α direction (slow-prior weighting) — confirm intent.

History

This PR was rebased onto the dflash → server rename on main and squashed from 19 commits into one feature commit. Dropped:

Backup of the pre-rebase head: 861c5e2 (saved locally for rollback if needed).

CI note

Both CI jobs on this PR fail at the final uv sync --frozen step with Distribution not found at .../dflash (or .../pflash). Root cause is on main: the uv.lock was not regenerated after the rename commit. C++ build, pytest server suite (67/67), and ctest all pass. Not a PR-side issue.

@dusterbloom dusterbloom force-pushed the feat/pflash-mvp-adaptive-keep branch from 7edccd1 to 7b9397c Compare May 23, 2026 10:52
@dusterbloom dusterbloom marked this pull request as draft May 23, 2026 17:05
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 27 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread dflash/src/server/adaptive_keep_ratio.h Outdated
Comment thread harness/clients/run_claude_code.sh
Comment thread dflash/bench/run_day5_abc.sh Outdated
Comment thread dflash/bench/run_day5_abc.sh Outdated
Comment thread dflash/bench/run_day5_abc.sh Outdated
dusterbloom added a commit to dusterbloom/lucebox-hub that referenced this pull request May 23, 2026
- NIAH 16K: 5/5 baseline (keep=0.20) and 5/5 bandit (keep=0.10); no retrieval degradation
- NIAH 32K: 5/5 baseline and 5/5 bandit; compression 5x->10x halves target prefill time
- 3-seed Day-5 A/B/C: decode_check / logic_check / math_check prompts, all ok_done=YES
- Pareto: C (bandit) wall=16.3±3.4s vs B wall=24.7±3.1s (1.52x); ar=34.6% vs 32.8%
- Bandit fired in all 3 sessions; per-session state isolation confirmed
@dusterbloom dusterbloom marked this pull request as ready for review May 23, 2026 19:37
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 52 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread dflash/src/server/http_server.cpp Outdated
Comment thread .github/workflows/ci.yml Outdated
Per-session ε-greedy bandit that adjusts compression based on observed
accept_rate. Opt-in via session_id; clients without it get the existing
fixed-keep path, byte-identical to main.

Includes:
- Bandit state machine (LRU-bounded session map, cap 1024)
- HTTP server session_id parsing + bandit hook
- accept_rate plumbing from DFlash GenerateResult
- CI submodule PAT fix for fork PRs
- Harness session_id env-var wiring

5-turn trajectory + NIAH @16K/32K + 3-seed A/B/C evidence
(reproducible via the follow-up bench PR; not committed here).

Bench scripts + result artifacts split to follow-up PR.
Bug Luce-Org#42 tail-capture fix moved to PR Luce-Org#274.
@dusterbloom dusterbloom force-pushed the feat/pflash-mvp-adaptive-keep branch from 861c5e2 to 3cdcf7b Compare May 27, 2026 07:42
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 27, 2026
Record the current squashed PR Luce-Org#274 head as integrated; resolved the CMake conflict by retaining the existing adaptive bandit tests from PR Luce-Org#264 while keeping the already-integrated early-exit drafter files.
dusterbloom added a commit to dusterbloom/lucebox-hub that referenced this pull request May 27, 2026
dusterbloom added a commit to dusterbloom/lucebox-hub that referenced this pull request May 27, 2026
…i-client support

- run_bandit_abc.sh: A (keep=0.05) / B (keep=0.20) / C (bandit) comparison
  with CLIENT env var selecting from primary-5 (claude_code/codex/pi/hermes/opencode)
- run_bandit_abc_seeds.sh: same shape with seed_label + prompt_basename args for
  variance runs across prompts
- Port from dflash/bench/run_day5_{abc,seeds_abc}.sh (PR Luce-Org#264 OID 861c5e2):
  updated worktree paths -> repo-relative, dflash/build/dflash_server -> C++ bin,
  port 18080 -> 19099, lock /tmp/dflash_gpu.lock -> /tmp/lucebox-bench.lock,
  added PFLASH_DRAFTER_EARLY_EXIT_N=7 + SCORE_LAYERS=7 to inner subshell
- Condition C injects PFLASH_SESSION_ID via session_inject_proxy.py on port 19082
PR Luce-Org#282 renamed dflash/ -> server/ but README still referenced the old
path in 7 quickstart commands (cmake -S dflash, --directory dflash,
cd lucebox-hub/dflash). Users following the README would hit
'No such file or directory'. Sweep path-shaped references; leave
binary names (test_dflash, dflash_server), submodule branch
(luce-dflash), and prose mentions of the dflash algorithm as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants