Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ The AgentScope Ecosystem
- **[2025-12] `FEAT`:** TTS (Text-to-Speech) support. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) | [Tutorial](https://doc.agentscope.io/tutorial/task_tts.html)
- **[2025-11] `INTG`:** Anthropic Agent Skill support. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) | [Tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html)
- **[2025-11] `RELS`:** Alias-Agent for diverse real-world tasks and Data-Juicer Agent for data processing open-sourced. [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) | [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent)
- **[2025-11] `INTG`:** Agentic RL via Trinity-RFT library. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `INTG`:** Agentic RL via Trinity-RFT library. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `INTG`:** ReMe for enhanced long-term memory. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme)
- **[2025-11] `RELS`:** agentscope-samples repository launched and agentscope-runtime upgraded with Docker/K8s deployment and VNC-powered GUI sandboxes. [Samples](https://github.com/agentscope-ai/agentscope-samples) | [Runtime](https://github.com/agentscope-ai/agentscope-runtime)
<!-- END NEWS -->
Expand Down Expand Up @@ -361,7 +361,7 @@ asyncio.run(multi_agent_conversation())

### Tuner

- [Tune ReAct Agent](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent)
- [Tune ReAct Agent](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning)


## Contributing
Expand Down
4 changes: 2 additions & 2 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ AgentScope 生态
- **[2025-12] `功能`:** TTS(文本转语音)支持。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) | [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html)
- **[2025-11] `集成`:** Anthropic Agent Skill 支持。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) | [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html)
- **[2025-11] `发布`:** 面向多样化真实任务的 Alias-Agent 和数据处理的 Data-Juicer Agent 开源。[Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) | [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent)
- **[2025-11] `集成`:** 通过 Trinity-RFT 库实现智能体强化学习。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `集成`:** 通过 Trinity-RFT 库实现智能体强化学习。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `集成`:** ReMe 增强长期记忆。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme)
- **[2025-11] `发布`:** agentscope-samples 样例库上线,agentscope-runtime 升级支持 Docker/K8s 部署和 VNC 图形沙盒。[样例库](https://github.com/agentscope-ai/agentscope-samples) | [Runtime](https://github.com/agentscope-ai/agentscope-runtime)
<!-- END NEWS -->
Expand Down Expand Up @@ -356,7 +356,7 @@ asyncio.run(multi_agent_conversation())

### 微调

- [调优 ReAct 智能体](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent)
- [调优 ReAct 智能体](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning)


## 贡献
Expand Down
2 changes: 1 addition & 1 deletion docs/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- **[2025-12] `FEAT`:** TTS (Text-to-Speech) support. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) | [Tutorial](https://doc.agentscope.io/tutorial/task_tts.html)
- **[2025-11] `INTG`:** Anthropic Agent Skill support. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) | [Tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html)
- **[2025-11] `RELS`:** Alias-Agent for diverse real-world tasks and Data-Juicer Agent for data processing open-sourced. [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) | [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent)
- **[2025-11] `INTG`:** Agentic RL via Trinity-RFT library. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `INTG`:** Agentic RL via Trinity-RFT library. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `INTG`:** ReMe for enhanced long-term memory. [Example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme)
- **[2025-11] `RELS`:** agentscope-samples repository launched and agentscope-runtime upgraded with Docker/K8s deployment and VNC-powered GUI sandboxes. [Samples](https://github.com/agentscope-ai/agentscope-samples) | [Runtime](https://github.com/agentscope-ai/agentscope-runtime)
- **[2025-11] `DOCS`:** Contributing Guide is online - welcome to contribute! [Guide](./CONTRIBUTING.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/NEWS_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- **[2025-12] `功能`:** TTS(文本转语音)支持。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) | [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html)
- **[2025-11] `集成`:** Anthropic Agent Skill 支持。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) | [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html)
- **[2025-11] `发布`:** 面向多样化真实任务的 Alias-Agent 和数据处理的 Data-Juicer Agent 开源。[Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) | [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent)
- **[2025-11] `集成`:** 通过 Trinity-RFT 库实现智能体强化学习。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `集成`:** 通过 Trinity-RFT 库实现智能体强化学习。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning) | [Trinity-RFT](https://github.com/agentscope-ai/Trinity-RFT)
- **[2025-11] `集成`:** ReMe 增强长期记忆。[样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme)
- **[2025-11] `发布`:** agentscope-samples 样例库上线,agentscope-runtime 升级支持 Docker/K8s 部署和 VNC 图形沙盒。[样例库](https://github.com/agentscope-ai/agentscope-samples) | [Runtime](https://github.com/agentscope-ai/agentscope-runtime)
- **[2025-11] `文档`:** 贡献指南上线 - 欢迎参与贡献![指南](./CONTRIBUTING_zh.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/en/src/task_tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ async def example_judge_function(
# Below is an example of configuring and starting the tuning process:
#
# .. note::
# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_
# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning>`_
#
# .. code-block:: python
#
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/zh_CN/src/task_tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ async def example_judge_function(
# 下面是调优流程的配置与启动示例:
#
# .. note::
# 此示例仅供演示。完整可运行示例请参考 `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_
# 此示例仅供演示。完整可运行示例请参考 `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning>`_
#
# .. code-block:: python
#
Expand Down
294 changes: 294 additions & 0 deletions examples/tuner/model_selection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# Model Selection Guide

AgentScope provides a `model_selection` sub-module in tuner module to automatically select the best performing model from a set of candidates based on evaluation metrics. This guide walks you through the steps to evaluate and select the optimal model for your agent workflow.

## Overview

Model selection is the process of choosing the best performing model from a set of candidate models based on their performance on a dataset. To use model selection, you need to understand three components:

1. **Workflow function**: An async function that takes a task and model, executes the task with the model, and returns a workflow output.
2. **Judge function**: A function that evaluates the workflow's output and returns a reward indicating performance.
3. **Task dataset**: A dataset containing samples for evaluation.

The following diagram illustrates the relationship between these components:

```mermaid
flowchart TD
CandidateModels[Candidate Models] --> WorkflowFunction[Workflow Function]
Task[Task] --> WorkflowFunction
WorkflowFunction --> JudgeFunction[Judge Function]
Task --> JudgeFunction
JudgeFunction --> Reward[Reward]
Reward --> ModelSelector[Model Selector]
ModelSelector --> BestModel[Best Performing Model]
```

## How to implement

Here we use a translation task scenario as an example to illustrate how to implement the above three components.

Suppose you have an agent workflow that performs translation using the `ReActAgent`.

```python
from agentscope.agent import ReActAgent
from agentscope.model import ChatModelBase

async def run_translation_agent(text: str, model: ChatModelBase):
agent = ReActAgent(
name="translator",
sys_prompt="You are a helpful translation agent. Translate the given text accurately, and only output the translated text.",
model=model,
formatter=OpenAIChatFormatter(),
)

response = await agent.reply(
msg=Msg("user", f"Translate the following text between English and Chinese: {text}", role="user"),
)

print(response)
```

### Step 1: Prepare task dataset

To evaluate models for translation tasks, you need a dataset that contains samples of source texts and their corresponding reference translations.

The dataset should be organized in a format that can be loaded using the `datasets.load_dataset` function (e.g., JSONL, Parquet, CSV) or from huggingface online datasets. For translation tasks, your data file (like `translate_data/test.json`) might contain samples like:

```json
{
"question": "量子退相干是限制量子计算机可扩展性的主要障碍之一。",
"answer": "Quantum decoherence is one of the primary obstacles limiting the scalability of quantum computers."
}
```


### Step 2: Define a workflow function

The workflow function takes a task dictionary and model as input, and returns a `WorkflowOutput`. The model selector will call this function with different models during evaluation.

```python
async def translation_workflow(
task: Dict,
model: ChatModelBase,
) -> WorkflowOutput:
"""Run the translation workflow on a single task with the given model."""
...
```

- Inputs:
- `task`: A dictionary representing a single training task from the dataset.
- `model`: The model to be used in the workflow. This will be evaluated by the selector.

- Returns:
- `WorkflowOutput`: An object containing the agent's response.

Below is a refactored version of the original `run_translation_agent` function to fit the workflow function pattern.

**Key changes from the original function**:

1. Add `model` as a parameter to the workflow function.
2. Use the input `model` to initialize the agent.
3. Use the `question` field from the `task` dictionary as the source text for translation.
4. Return a `WorkflowOutput` object containing the agent's response.

```python
from agentscope.agent import ReActAgent
from agentscope.formatter import OpenAIChatFormatter
from agentscope.tuner import WorkflowOutput
from agentscope.message import Msg

async def translation_workflow(
task: Dict,
model: ChatModelBase,
) -> WorkflowOutput:
agent = ReActAgent(
name="translator",
sys_prompt="You are a helpful translation agent. Translate the given text accurately, and only output the translated text.",
model=model,
formatter=OpenAIChatFormatter(),
)

# Extract source text from task
source_text = task.get("question", "") if isinstance(task, dict) else str(task)

# Create a message with the translation request
prompt = f"Translate the following text between English and Chinese: {source_text}"
msg = Msg(name="user", content=prompt, role="user")

# Get response from the agent
response = await agent.reply(msg=msg)

return WorkflowOutput(
response=response,
)
```

### Step 3: Implement the judge function

The judge function evaluates the workflow's response and returns a reward. Higher reward values indicate better performance.

```python
async def judge_function(
task: Dict,
response: Any,
) -> JudgeOutput:
"""Calculate reward based on the input task and workflow's response."""
```

- Inputs:
- `task`: A dictionary representing a single training task.
- `response`: A composite dict containing:
- `"response"`: The actual response from the workflow function.
- `"metrics"`: Workflow metrics including execution_time and token usage.

- Outputs:
- `JudgeOutput`: An object containing:
- `reward`: A scalar float representing the reward (higher is better).
- `metrics`: Optional dictionary of additional metrics.

Here is an example implementation for translation tasks using BLEU score (please pip install the `sacrebleu` package first):

```python
from agentscope.tuner import JudgeOutput

async def bleu_judge(
task: Dict,
response: Any,
) -> JudgeOutput:
"""Calculate BLEU score for translation quality."""
# Lazy import to follow the requirement
import sacrebleu

# Extract response text from the composite dict
response_content = response["response"]
response_str = response_content.get_text_content()

# Extract reference translation
reference_translation = task.get("answer", "") if isinstance(task, dict) else ""

# Calculate BLEU score
ref = reference_translation.strip()
pred = response_str.strip()
bleu_score = sacrebleu.sentence_bleu(pred, [ref])

return JudgeOutput(
reward=bleu_score.score,
metrics={
"bleu": bleu_score.score/100,
"brevity_penalty": bleu_score.bp,
"ratio": bleu_score.ratio
}
)
```

AgentScope.tuner also provides built-in judge functions for common workflow conducting efficiency metrics, such as execution time and token usage in example_token_usage.py:

```python
from agentscope.tuner.model_selection import avg_time_judge, avg_token_consumption_judge

# For selecting based on fastest execution time
judge_function = avg_time_judge

# For selecting based on lowest token consumption
judge_function = avg_token_consumption_judge
```

### Step 4: Start model selection

Use the `select_model` interface to find the best performing model.

```python
from agentscope.tuner import DatasetConfig
from agentscope.tuner.model_selection import select_model
from agentscope.model import DashScopeChatModel
import os

# your workflow / judge function and candidate models here...

if __name__ == "__main__":
# Define your candidate models
model1 = DashScopeChatModel(
"qwen3-max-2025-09-23",
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
max_tokens=1024,
)
model2 = DashScopeChatModel(
"deepseek-r1",
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
max_tokens=1024,
)

best_model, metrics = select_model(
workflow_func=translation_workflow,
judge_func=bleu_judge,
train_dataset=DatasetConfig(path="examples/tuner/model_selection/translate_data.json"),
candidate_models=[model1, model2],
)

print(f"Best model: {best_model.model_name}")
print(f"Performance metrics: {metrics}")
```

---

> **Note**: Besides the BLEU score judge function shown in this example, you can also implement custom judge functions for your specific use case. Alternatively, you can use built-in functions for optimizing workflow efficiency such as time and token usage judges, which can be referenced in `example_token_usage.py`.

---

## How to run

After implementing the workflow and judge function, follow these steps to run model selection:

1. Prerequisites

- Set up your API key as an environment variable:

```bash
export DASHSCOPE_API_KEY="your_api_key_here"
```

- Prepare your dataset in a supported format (JSONL, Parquet, CSV, etc.).

- Install required dependencies if not already installed:

```bash
pip install datasets
```

2. Run the selection script

```bash
python example_token_usage.py # or other example files in this directory
```

3. The best performing model will be returned along with performance metrics.

## Output

```
Evaluating 3 candidate models: ['qwen3-max', 'deepseek-r1', 'glm-4.7']

INFO:agentscope.tuner.model_selection._model_selection:Model evaluation results:
INFO:agentscope.tuner.model_selection._model_selection: qwen3-max: 61.8407
INFO:agentscope.tuner.model_selection._model_selection: deepseek-r1: 43.5547
INFO:agentscope.tuner.model_selection._model_selection: glm-4.7: 48.8801

Selected best model: qwen3-max-2025-09-23
Metrics: {'bleu_avg': 0.6184069765855449, 'brevity_penalty_avg': 0.9900344064325004, 'ratio_avg': 1.070816065067906}
```

---

## Use Cases

Model selection is particularly useful for:

| Scenario | Benefit |
|----------|---------|
| **Performance optimization** | Identify the model that achieves the highest accuracy/reward on your specific task |
| **Cost efficiency** | Select models that achieve desired performance with lower computational costs |
| **Latency requirements** | Choose models that meet your speed/latency constraints |
| **Resource constraints** | Find the best model that fits within your hardware limitations |

> [!TIP]
> Model selection is ideal when you have multiple models available and want to systematically identify which performs best for your specific use case.
Loading
Loading