feat(llm): return chat completion token ids via nvext#9509
Conversation
|
👋 Hi AmeenP! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
Signed-off-by: AmeenP <ameenp360@gmail.com>
568042b to
426ed44
Compare
WalkthroughThis PR extends the nvext response protocol with a new ChangesCompletion Token IDs Support
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: AmeenP <ameenp360@gmail.com>
Signed-off-by: AmeenP <ameenp360@gmail.com>
Signed-off-by: AmeenP <ameenp360@gmail.com>
Summary
This is the Dynamo-side token-in/nvext contract needed by PrimeIntellect-ai/renderers#11, split out from the larger draft RL/admin PR.
nvext.extra_fields: ["completion_token_ids"]for chat completionsresponse.nvext.completion_token_idson the final chunk/non-streaming aggregate responsenvext.extra_fields: ["completion_token_ids"]whenn > 1until Dynamo has an indexed per-choice shapestop_token_idsand plumbs them into token stop conditionscache_salt, supports canonicalnvext.cache_salt, and forwards the salt through backendextra_args["nvext"]nvext.extra_fields: ["engine_data"]by returning generatedcompletion_token_idsand availablecompletion_logprobsunderresponse.nvext.engine_datanvext.token_dataskips tokenization independently ofbackend_instance_idIntentionally not included:
/v1/rl/*, LoRA admin, pause/resume, weight update, or worker RL dispatch changes from #9382.Validation
cargo fmt --checkgit diff --checkpython -m py_compile components/src/dynamo/vllm/handlers.pyAttempted targeted
dynamo-llmRust tests fortest_cache_salt,test_completion_token_ids_rejects_multiple_choices, andtest_engine_data_accumulates_completion_token_ids_and_logprobs, but this macOS checkout is blocked by existing Linux-onlydynamo-llmcompile paths (dynamo_memory::numa,DiskStorage,fallocate,O_DIRECT).