Skip to content

fix: attribute transparent-proxy traffic to run_id#26

Merged
JannikSt merged 3 commits into
mainfrom
fix/transparent-proxy-usage-attribution
May 4, 2026
Merged

fix: attribute transparent-proxy traffic to run_id#26
JannikSt merged 3 commits into
mainfrom
fix/transparent-proxy-usage-attribution

Conversation

@JannikSt

@JannikSt JannikSt commented May 3, 2026

Copy link
Copy Markdown
Member

Catch-all transparent proxy was bypassing per-run usage counters.

  • /v1/generate (renderer rollouts) is not an explicit route on the router, so requests fall through to transparent_proxy_handlerroute_transparent, which never called record_run_request / extract_and_record_usage.
  • Result on Johannes' run o2eh6s2nqx41l4pq5qfft8jk: 0 entries in vllm_router_run_prompt_tokens_total{run_id=...} despite millions of tokens generated and 200 OK responses on /v1/generate at the inference pods.

Threads run_id through RouterTrait::route_transparent and reuses the existing record_run_request + extract_and_record_usage_buffered helpers. Streaming pass-through preserved when no run_id is present.


Note

Medium Risk
Touches request/response streaming paths and load accounting in the HTTP routers; mistakes could impact billing metrics, memory use (buffering), or worker load tracking under retries/streaming.

Overview
Adds run_id: Option<&str> plumbing through RouterTrait::route_transparent and the server/router-manager call chain so catch-all transparent-proxy requests can be attributed to a run.

Updates transparent-proxy forwarding in the standard router and PD router to detect SSE (text/event-stream) vs non-SSE responses: SSE stays streaming while teeing chunks through usage_metrics::SseUsageExtractor, and non-SSE buffers the body once to extract_and_record_usage before returning.

Fixes an early-return path in send_typed_request to always decrement worker load on rewritten vLLM 400s and genuine 500s to prevent load leaks during retries.

Reviewed by Cursor Bugbot for commit 68fe703. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Attribute transparent-proxy traffic to run_id for per-run usage recording

  • Extracts run_id from JWT claims in transparent_proxy_handler and v1_responses in server.rs and passes it down through RouterManager and router implementations to route_transparent.
  • For SSE responses, tees the upstream byte stream through an unbounded mpsc channel while feeding chunks into SseUsageExtractor; for non-SSE responses, buffers the full body before calling extract_and_record_usage.
  • Fixes a worker load leak in router.rs where the load counter was not decremented on 500 early-return paths, only on rewritten 400s.
  • Behavioral Change: route_transparent now returns 502 Bad Gateway when reading the upstream body fails in the non-streaming path, instead of attempting to stream.

Macroscope summarized 68fe703.

Catch-all transparent proxy was bypassing per-run usage counters,
so renderer rollouts hitting vLLM /v1/generate (no explicit route on
the router) appeared as 0 inference cost on training-run dashboards.
Thread run_id through route_transparent and call the existing
record_run_request + extract_and_record_usage_buffered helpers.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18a3742734

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routers/http/router.rs Outdated
Comment thread src/routers/http/pd_router.rs Outdated
@macroscopeapp

macroscopeapp Bot commented May 3, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR involves billing attribution changes (explicitly mentioned as preventing 'silently bypassing billing') and has unresolved review comments identifying a streaming regression that breaks real-time response delivery for authenticated requests.

You can customize Macroscope's approvability policy. Learn more.

Initial transparent-proxy attribution buffered the entire upstream body
when run_id was set, regressing /v1/generate (and any other catch-all
SSE endpoint) from incremental token delivery to end-of-request
delivery and inflating memory for long generations.

Branch on Content-Type: text/event-stream → spawn forwarder + tee
SseUsageExtractor (mirrors route_typed_request's streaming path).
Non-SSE responses still buffer once for the JSON usage extraction.
@JannikSt

JannikSt commented May 3, 2026

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

#23 removed the retry-cleanup decrement in route_typed_request,
expecting send_typed_request to own load lifecycle. The early-return
500 path (input-validation rewrite branch) only decrements when
status is rewritten to 400 — for actual 500s it skips, so each
retry attempt's increment leaks into phantom load. Always decrement
on this path when load_incremented; fixes the two pre-existing
load_tracking_test failures.

Also drops a few 'billing' references from the new comments.
@eexwhyzee

Copy link
Copy Markdown

lgtm, i can roll this out to the develop environment first to do a quick test once a new agent tag is built, and then do the same for the rest of prod

@JannikSt JannikSt merged commit 7ccad78 into main May 4, 2026
7 checks passed
@JannikSt JannikSt mentioned this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants