Skip to content

feat(llm): return chat completion token ids via nvext#9509

Open
AmeenP wants to merge 4 commits into
ai-dynamo:mainfrom
AmeenP:codex/renderers-nvext-contract
Open

feat(llm): return chat completion token ids via nvext#9509
AmeenP wants to merge 4 commits into
ai-dynamo:mainfrom
AmeenP:codex/renderers-nvext-contract

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 13, 2026

Summary

This is the Dynamo-side token-in/nvext contract needed by PrimeIntellect-ai/renderers#11, split out from the larger draft RL/admin PR.

  • adds nvext.extra_fields: ["completion_token_ids"] for chat completions
  • accumulates generated token IDs across chat completion chunks and returns them under response.nvext.completion_token_ids on the final chunk/non-streaming aggregate response
  • rejects nvext.extra_fields: ["completion_token_ids"] when n > 1 until Dynamo has an indexed per-choice shape
  • accepts rl-sdk compatible root stop_token_ids and plumbs them into token stop conditions
  • accepts renderer-compatible top-level cache_salt, supports canonical nvext.cache_salt, and forwards the salt through backend extra_args["nvext"]
  • supports nvext.extra_fields: ["engine_data"] by returning generated completion_token_ids and available completion_logprobs under response.nvext.engine_data
  • documents that nvext.token_data skips tokenization independently of backend_instance_id

Intentionally not included: /v1/rl/*, LoRA admin, pause/resume, weight update, or worker RL dispatch changes from #9382.

Validation

  • cargo fmt --check
  • git diff --check
  • python -m py_compile components/src/dynamo/vllm/handlers.py

Attempted targeted dynamo-llm Rust tests for test_cache_salt, test_completion_token_ids_rejects_multiple_choices, and test_engine_data_accumulates_completion_token_ids_and_logprobs, but this macOS checkout is blocked by existing Linux-only dynamo-llm compile paths (dynamo_memory::numa, DiskStorage, fallocate, O_DIRECT).

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi AmeenP! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels May 13, 2026
Signed-off-by: AmeenP <ameenp360@gmail.com>
@AmeenP AmeenP force-pushed the codex/renderers-nvext-contract branch from 568042b to 426ed44 Compare May 13, 2026 22:40
@AmeenP AmeenP marked this pull request as ready for review May 13, 2026 22:44
@AmeenP AmeenP requested a review from a team as a code owner May 13, 2026 22:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Walkthrough

This PR extends the nvext response protocol with a new completion_token_ids field that accumulates backend-generated token IDs across streaming chunks and emits the concatenated list on the final response chunk, with corresponding schema, streaming logic, and documentation updates.

Changes

Completion Token IDs Support

Layer / File(s) Summary
NvExt Protocol Extension
lib/llm/src/protocols/openai/nvext.rs
NvExtResponse and NvExtResponseFieldSelection structs gain completion_token_ids fields; field selection logic parses "completion_token_ids" from extra_fields; documentation and unit tests updated to reflect the new optional field.
DeltaGenerator Streaming Implementation
lib/llm/src/protocols/openai/chat_completions/delta.rs
DeltaGenerator adds internal state to accumulate token IDs across chunks; choice_from_postprocessor conditionally appends backend token IDs when the field is enabled; nvext response construction injects the accumulated list into the payload on the final chunk only; unit test verifies correct chunk-level emission.
User-Facing Documentation
docs/components/frontend/nvext.md
Field reference table clarified and extra_fields support expanded; response extensions table documents the new completion_token_ids field and its final-chunk emission semantics.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(llm): return chat completion token ids via nvext' clearly and specifically summarizes the main change—adding support for returning completion token IDs through the nvext response field.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering the purpose, implementation details, validation steps, and intentional exclusions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread lib/llm/src/protocols/openai/chat_completions/delta.rs Outdated
Signed-off-by: AmeenP <ameenp360@gmail.com>
@AmeenP AmeenP requested review from a team as code owners May 15, 2026 09:20
@github-actions github-actions Bot added the backend::vllm Relates to the vllm backend label May 15, 2026
AmeenP added 2 commits May 15, 2026 04:29
Signed-off-by: AmeenP <ameenp360@gmail.com>
Signed-off-by: AmeenP <ameenp360@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants