feat(pflash): adaptive keep_ratio bandit MVP#264
Merged
davide221 merged 2 commits intoMay 27, 2026
Conversation
7edccd1 to
7b9397c
Compare
Contributor
There was a problem hiding this comment.
5 issues found across 27 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
dusterbloom
added a commit
to dusterbloom/lucebox-hub
that referenced
this pull request
May 23, 2026
- NIAH 16K: 5/5 baseline (keep=0.20) and 5/5 bandit (keep=0.10); no retrieval degradation - NIAH 32K: 5/5 baseline and 5/5 bandit; compression 5x->10x halves target prefill time - 3-seed Day-5 A/B/C: decode_check / logic_check / math_check prompts, all ok_done=YES - Pareto: C (bandit) wall=16.3±3.4s vs B wall=24.7±3.1s (1.52x); ar=34.6% vs 32.8% - Bandit fired in all 3 sessions; per-session state isolation confirmed
Contributor
There was a problem hiding this comment.
2 issues found across 52 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Per-session ε-greedy bandit that adjusts compression based on observed accept_rate. Opt-in via session_id; clients without it get the existing fixed-keep path, byte-identical to main. Includes: - Bandit state machine (LRU-bounded session map, cap 1024) - HTTP server session_id parsing + bandit hook - accept_rate plumbing from DFlash GenerateResult - CI submodule PAT fix for fork PRs - Harness session_id env-var wiring 5-turn trajectory + NIAH @16K/32K + 3-seed A/B/C evidence (reproducible via the follow-up bench PR; not committed here). Bench scripts + result artifacts split to follow-up PR. Bug Luce-Org#42 tail-capture fix moved to PR Luce-Org#274.
861c5e2 to
3cdcf7b
Compare
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 27, 2026
Record the current squashed PR Luce-Org#274 head as integrated; resolved the CMake conflict by retaining the existing adaptive bandit tests from PR Luce-Org#264 while keeping the already-integrated early-exit drafter files.
dusterbloom
added a commit
to dusterbloom/lucebox-hub
that referenced
this pull request
May 27, 2026
dusterbloom
added a commit
to dusterbloom/lucebox-hub
that referenced
this pull request
May 27, 2026
…i-client support
- run_bandit_abc.sh: A (keep=0.05) / B (keep=0.20) / C (bandit) comparison
with CLIENT env var selecting from primary-5 (claude_code/codex/pi/hermes/opencode)
- run_bandit_abc_seeds.sh: same shape with seed_label + prompt_basename args for
variance runs across prompts
- Port from dflash/bench/run_day5_{abc,seeds_abc}.sh (PR Luce-Org#264 OID 861c5e2):
updated worktree paths -> repo-relative, dflash/build/dflash_server -> C++ bin,
port 18080 -> 19099, lock /tmp/dflash_gpu.lock -> /tmp/lucebox-bench.lock,
added PFLASH_DRAFTER_EARLY_EXIT_N=7 + SCORE_LAYERS=7 to inner subshell
- Condition C injects PFLASH_SESSION_ID via session_inject_proxy.py on port 19082
PR Luce-Org#282 renamed dflash/ -> server/ but README still referenced the old path in 7 quickstart commands (cmake -S dflash, --directory dflash, cd lucebox-hub/dflash). Users following the README would hit 'No such file or directory'. Sweep path-shaped references; leave binary names (test_dflash, dflash_server), submodule branch (luce-dflash), and prose mentions of the dflash algorithm as-is.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the fixed
keep_ratioPFlash knob with a per-session ε-greedy bandit that adjusts compression based on observedaccept_rate. Opt-in — clients withoutsession_idget the existing fixed-keep path, byte-identical tomain.Source changes (1,043 LoC, 17 files, all under
server/+harness/+thoughts/)server/src/server/adaptive_keep_ratio.hstep_adaptive_keep_ratio+ LRU-boundedHttpServerSessions, cap 1024)server/src/server/http_server.{cpp,h}session_id, call bandit, injectaccept_rateinto chat/anthropic/responses usage objectsserver/src/qwen35/qwen35_backend.{cpp,h}accept_rate+spec_decode_ranout ofdo_spec_decodeserver/src/common/model_backend.haccept_rate/spec_decode_ranfields onGenerateResultserver/test/test_adaptive_keep_ratio.cppserver/test/test_bandit_integration.cppserver/test/test_server_unit.cppParsedRequest::session_id+accept_rateusage-object contractserver/CMakeLists.txt.github/workflows/ci.ymlharness/clients/run_claude_code.shPFLASH_SESSION_IDenv into request bodyharness/clients/session_inject_proxy.pyextra_body.session_idinto requestsharness/clients/prompts/{logic,math}_check.txtthoughts/2026-05-21_pflash_mvp_plan.mdAll bandit logic lives in the C++ HTTP server. The Python server is deprecated and untouched.
Behavior matrix
mainsession_idextra_body.session_id="…"pflash_mode=offEvidence
5-turn adaptive trajectory (keep moves 0.10 → 0.15 as bandit responds to 6.2% accept), NIAH @ 16K/32K at 5/5 with ~1.9× faster prefill at keep=0.10, and Day-5 A/B/C (claude_code single-shot) showing the bandit beating both fixed arms (16s wall, 31.9% accept, OK_DONE). Bench scripts + result artifacts moved to a follow-up PR to keep this one focused on the C++ feature.
Open questions for reviewers
extra_body.session_idvs top-levelsession_id— kept both, prefer one?kBanditStepLarge = 0.01— too slow for bad priors?History
This PR was rebased onto the
dflash → serverrename onmainand squashed from 19 commits into one feature commit. Dropped:58a9bef) — moved to PR feat(pflash): prefill compress up to 128k -> 2-12× prefill (content-dependent), decode at parity #274861c5e2) — separate concernbench/results/...) and bench scripts — split to a follow-up PRBackup of the pre-rebase head:
861c5e2(saved locally for rollback if needed).CI note
Both CI jobs on this PR fail at the final
uv sync --frozenstep withDistribution not found at .../dflash(or.../pflash). Root cause is onmain: theuv.lockwas not regenerated after the rename commit. C++ build, pytest server suite (67/67), and ctest all pass. Not a PR-side issue.