fix: attribute transparent-proxy traffic to run_id#26
Conversation
Catch-all transparent proxy was bypassing per-run usage counters, so renderer rollouts hitting vLLM /v1/generate (no explicit route on the router) appeared as 0 inference cost on training-run dashboards. Thread run_id through route_transparent and call the existing record_run_request + extract_and_record_usage_buffered helpers.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18a3742734
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ApprovabilityVerdict: Needs human review This PR involves billing attribution changes (explicitly mentioned as preventing 'silently bypassing billing') and has unresolved review comments identifying a streaming regression that breaks real-time response delivery for authenticated requests. You can customize Macroscope's approvability policy. Learn more. |
Initial transparent-proxy attribution buffered the entire upstream body when run_id was set, regressing /v1/generate (and any other catch-all SSE endpoint) from incremental token delivery to end-of-request delivery and inflating memory for long generations. Branch on Content-Type: text/event-stream → spawn forwarder + tee SseUsageExtractor (mirrors route_typed_request's streaming path). Non-SSE responses still buffer once for the JSON usage extraction.
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
#23 removed the retry-cleanup decrement in route_typed_request, expecting send_typed_request to own load lifecycle. The early-return 500 path (input-validation rewrite branch) only decrements when status is rewritten to 400 — for actual 500s it skips, so each retry attempt's increment leaks into phantom load. Always decrement on this path when load_incremented; fixes the two pre-existing load_tracking_test failures. Also drops a few 'billing' references from the new comments.
|
lgtm, i can roll this out to the develop environment first to do a quick test once a new agent tag is built, and then do the same for the rest of prod |
Catch-all transparent proxy was bypassing per-run usage counters.
/v1/generate(renderer rollouts) is not an explicit route on the router, so requests fall through totransparent_proxy_handler→route_transparent, which never calledrecord_run_request/extract_and_record_usage.o2eh6s2nqx41l4pq5qfft8jk: 0 entries invllm_router_run_prompt_tokens_total{run_id=...}despite millions of tokens generated and 200 OK responses on/v1/generateat the inference pods.Threads
run_idthroughRouterTrait::route_transparentand reuses the existingrecord_run_request+extract_and_record_usage_bufferedhelpers. Streaming pass-through preserved when norun_idis present.Note
Medium Risk
Touches request/response streaming paths and load accounting in the HTTP routers; mistakes could impact billing metrics, memory use (buffering), or worker load tracking under retries/streaming.
Overview
Adds
run_id: Option<&str>plumbing throughRouterTrait::route_transparentand the server/router-manager call chain so catch-all transparent-proxy requests can be attributed to a run.Updates transparent-proxy forwarding in the standard router and PD router to detect SSE (
text/event-stream) vs non-SSE responses: SSE stays streaming while teeing chunks throughusage_metrics::SseUsageExtractor, and non-SSE buffers the body once toextract_and_record_usagebefore returning.Fixes an early-return path in
send_typed_requestto always decrement worker load on rewritten vLLM 400s and genuine 500s to prevent load leaks during retries.Reviewed by Cursor Bugbot for commit 68fe703. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Attribute transparent-proxy traffic to
run_idfor per-run usage recordingrun_idfrom JWT claims intransparent_proxy_handlerandv1_responsesin server.rs and passes it down throughRouterManagerand router implementations toroute_transparent.SseUsageExtractor; for non-SSE responses, buffers the full body before callingextract_and_record_usage.route_transparentnow returns 502 Bad Gateway when reading the upstream body fails in the non-streaming path, instead of attempting to stream.Macroscope summarized 68fe703.