feat: add reasoning in model eval and generate config to use the best reasoning options #138

rootfs · 2025-09-15T14:28:20Z

What type of PR is this?

Currently the MMLU pro eval and result_to_config.py is a two step process, not well integrated. Additionally, the reasoning eval is not implemented yet.

This consolidated script is to use a config template with more comprehensive model eval to generate the router config.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Release Notes: Yes/No

… reasoning options Signed-off-by: Huamin Chen <[email protected]>

github-actions · 2025-09-15T14:28:34Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/training/model_eval/config_template.yaml
src/training/model_eval/reasoning_eval_consolidated.py

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

netlify · 2025-09-15T14:29:31Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`9c934b6`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68c8253bdc100c00089ff73a
😎 Deploy Preview	https://deploy-preview-138--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull Request Overview

This PR introduces a comprehensive statistical reasoning evaluation framework that integrates model evaluation and configuration generation into a single consolidated script. The framework uses advanced statistical methods to determine optimal reasoning mode configurations for LLM routers based on empirical evidence.

Implements multi-criteria statistical analysis using McNemar's test, Fisher's exact test, Cohen's h effect size, and Bayesian inference
Consolidates the previously separate MMLU pro eval and result_to_config.py processes into one unified workflow
Generates production-ready YAML configurations with data-driven reasoning decisions for each category

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/training/model_eval/reasoning_eval_consolidated.py	Main evaluation script implementing statistical framework and configuration generation
src/training/model_eval/config_template.yaml	YAML template defining router configuration structure with reasoning family mappings

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-15T14:29:29Z

src/training/model_eval/reasoning_eval_consolidated.py

+        print(f"⚠️  Warning: Template file not found at {template_path}")
+        # exit with error


The comment indicates an exit should occur but no actual exit or return statement is present. This will cause the function to continue and return None, potentially causing errors downstream when the config is expected to be a dictionary.

Copilot · 2025-09-15T14:29:30Z

src/training/model_eval/reasoning_eval_consolidated.py

+        params = {
+            "model": model,
+            "messages": [{"role": "user", "content": prompt}],
+            "max_tokens": 6144,  # Increased for reasoning mode support (allows full reasoning chains)


Magic number 6144 should be defined as a named constant at the module level for better maintainability and configurability.

Copilot · 2025-09-15T14:29:30Z

src/training/model_eval/reasoning_eval_consolidated.py

+    if "deepseek" in model_lower and ("v31" in model_lower):
+        # DeepSeek v3.1 reasoning via chat template kwargs


The version check 'v31' is fragile and may not match future versioning schemes (e.g., 'v3.1', 'v31.1'). Consider using a more robust pattern matching approach.

Suggested change

if "deepseek" in model_lower and ("v31" in model_lower):

# DeepSeek v3.1 reasoning via chat template kwargs

if "deepseek" in model_lower and re.search(r"v3(?:[._]?1(?:\.\d+)?)?", model_lower):

# DeepSeek v3.x reasoning via chat template kwargs

Signed-off-by: Huamin Chen <[email protected]>

feat: add reasoning in model eval and generate config to use the best…

5227553

… reasoning options Signed-off-by: Huamin Chen <[email protected]>

rootfs requested review from Xunzhuo and wangchen615 as code owners September 15, 2025 14:28

rootfs requested a review from Copilot September 15, 2025 14:28

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Sep 15, 2025

rootfs marked this pull request as draft September 15, 2025 14:28

Copilot AI reviewed Sep 15, 2025

View reviewed changes

rootfs added 2 commits September 15, 2025 14:37

address review comments

a25fa15

Signed-off-by: Huamin Chen <[email protected]>

fix pre-commit error

9c934b6

Signed-off-by: Huamin Chen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add reasoning in model eval and generate config to use the best reasoning options #138

feat: add reasoning in model eval and generate config to use the best reasoning options #138

rootfs commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

netlify bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 15, 2025

Uh oh!

Copilot AI Sep 15, 2025

Uh oh!

Copilot AI Sep 15, 2025

Uh oh!

Uh oh!

		print(f"⚠️ Warning: Template file not found at {template_path}")
		# exit with error

		if "deepseek" in model_lower and ("v31" in model_lower):
		# DeepSeek v3.1 reasoning via chat template kwargs

feat: add reasoning in model eval and generate config to use the best reasoning options #138

Are you sure you want to change the base?

feat: add reasoning in model eval and generate config to use the best reasoning options #138

Conversation

rootfs commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

🎉 Thanks for your contributions!

Uh oh!

netlify bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2025 •

edited

Loading

📁 `src`

netlify bot commented Sep 15, 2025 •

edited

Loading