Skip to content

[codex] fix prefix cache for user-first prompts#387

Draft
easel wants to merge 1 commit into
Luce-Org:mainfrom
easel:codex/prefix-cache-boundary-telemetry
Draft

[codex] fix prefix cache for user-first prompts#387
easel wants to merge 1 commit into
Luce-Org:mainfrom
easel:codex/prefix-cache-boundary-telemetry

Conversation

@easel

@easel easel commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • Allows prefix-cache boundary detection to start from the first chat role marker when a prompt has no system message.
  • Emits usage.timings.prompt_n_cached when the server restores a cached prefix.
  • Adds model-free unit coverage for Qwen-style system-first and user-first prompts plus timing telemetry.

Why

Agentic multi-turn prompts commonly start with a user message. The prefix cache previously required the system-role marker before it would identify boundaries, so user-first sessions never activated the cache even when subsequent turns shared a long prefix.

Validation

  • cmake --build build --target test_server_unit dflash_server -j 8
  • ./build/test_server_unit

Review in cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant