-
Notifications
You must be signed in to change notification settings - Fork 116
feat: add reasoning in model eval and generate config to use the best reasoning options #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… reasoning options Signed-off-by: Huamin Chen <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive statistical reasoning evaluation framework that integrates model evaluation and configuration generation into a single consolidated script. The framework uses advanced statistical methods to determine optimal reasoning mode configurations for LLM routers based on empirical evidence.
- Implements multi-criteria statistical analysis using McNemar's test, Fisher's exact test, Cohen's h effect size, and Bayesian inference
- Consolidates the previously separate MMLU pro eval and result_to_config.py processes into one unified workflow
- Generates production-ready YAML configurations with data-driven reasoning decisions for each category
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/training/model_eval/reasoning_eval_consolidated.py | Main evaluation script implementing statistical framework and configuration generation |
src/training/model_eval/config_template.yaml | YAML template defining router configuration structure with reasoning family mappings |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
print(f"⚠️ Warning: Template file not found at {template_path}") | ||
# exit with error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment indicates an exit should occur but no actual exit or return statement is present. This will cause the function to continue and return None, potentially causing errors downstream when the config is expected to be a dictionary.
Copilot uses AI. Check for mistakes.
params = { | ||
"model": model, | ||
"messages": [{"role": "user", "content": prompt}], | ||
"max_tokens": 6144, # Increased for reasoning mode support (allows full reasoning chains) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number 6144 should be defined as a named constant at the module level for better maintainability and configurability.
Copilot uses AI. Check for mistakes.
if "deepseek" in model_lower and ("v31" in model_lower): | ||
# DeepSeek v3.1 reasoning via chat template kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version check 'v31' is fragile and may not match future versioning schemes (e.g., 'v3.1', 'v31.1'). Consider using a more robust pattern matching approach.
if "deepseek" in model_lower and ("v31" in model_lower): | |
# DeepSeek v3.1 reasoning via chat template kwargs | |
if "deepseek" in model_lower and re.search(r"v3(?:[._]?1(?:\.\d+)?)?", model_lower): | |
# DeepSeek v3.x reasoning via chat template kwargs |
Copilot uses AI. Check for mistakes.
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
What type of PR is this?
Currently the MMLU pro eval and result_to_config.py is a two step process, not well integrated. Additionally, the reasoning eval is not implemented yet.
This consolidated script is to use a config template with more comprehensive model eval to generate the router config.
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Release Notes: Yes/No