RelayLLM: Efficient Reasoning via Collaborative Decoding

Empower Small Language Models (SLMs) to act as active controllers, invoking Large Language Models (LLMs) only for critical tokens. Achieve expert-level reasoning with minimal cost.

Check out our paper and models for the details.

🔥 Updates

[2026-01-08] We released our paper and code. RelayLLM achieves 98.2% cost reduction compared to random routers while bridging the performance gap between SLMs and LLMs!

🧩 Overview

Deploying Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency. Existing "routing" approaches operate at a coarse granularity (offloading entire queries), leading to significant waste when the small model could have handled most of the steps.

RelayLLM is a novel framework for token-level collaborative decoding. Unlike passive routers, RelayLLM empowers the SLM to act as an active controller. It dynamically invokes the LLM only for critical tokens via a special <call> command, effectively "relaying" the generation process to the expert when necessary.

Our approach utilizes a two-stage training framework combining Supervised Warm-up and Group Relative Policy Optimization (GRPO) to teach the model to balance independence with strategic help-seeking.

Key Features

Token-Level Granularity: Collaboration happens within the generation stream via interleaved decoding, not just at the query level.
Active Control: The SLM autonomously decides when and how long to call the LLM using a learned <call> token.
Extreme Efficiency: Reduces token costs by 98.2% compared to performance-matched routers, invoking the LLM for only ~1% of total generated tokens.
Difficulty-Aware Reward: A specialized RL reward system designed to encourage independence on easy tasks (Student-Solvable) and help-seeking only on hard ones (Teacher-Dependent).
Bridged Performance: Recovers ~60% of the performance gap between the SLM and LLM on challenging math benchmarks.

⚡️ Quickstart Guide

Getting started with RelayLLM is straightforward.

1. Configure Environment and Prepare Dirs

git clone https://github.com/Chengsong-Huang/RelayLLM.git

# Navigate into the new directory
cd RelayLLM
# Install the required packages
pip install -r requirements.txt

# We use vLLM for efficient teacher model serving
pip install vllm

# Create storage directories
export STORAGE_PATH="/path/to/your/storage"

If you meet any problems, please refer to installation for verl

2. Configure Environment and Prepare Dirs

# run the example codes
sh example.bash

📊 Impressive Results

The table below compares RelayLLM against the Base SLM, GRPO baseline, and other routing methods (CITER). Results are averaged across six benchmarks (Minerva, MATH-500, GSM8K, Olympiad-Bench, AIME-2024, AIME-2025).

Model Family	Method	Avg. Accuracy (%)	Avg. Call Ratio (%)
Qwen3-0.6B	Base Model	27.17	-
	GRPO Baseline	29.91	-
	CITER (Token-Level)	30.77	0.98%
	RelayLLM (Ours)	33.04	0.77%
Qwen3-1.7B	Base Model	42.50	-
	GRPO Baseline	44.06	-
	CITER (Token-Level)	46.81	1.34%
	RelayLLM (Ours)	49.52	1.07%
Qwen3-8B	Teacher LLM	54.12	100%

Note: RelayLLM (Difficulty-Aware) achieves the best trade-off, recovering significant performance with negligible token overhead (~1%).

🙏 Acknowledgements

Our framework is directly based on the great work of EasyR1, implementing all of its core functionalities. Additionally, our evaluation process referenced the work from General-Reasoner. We are very grateful for their excellent work.

💬 Citation

If our work is useful for you, please consider citing our paper:

@misc{huang2026relayllmefficientreasoningcollaborative,
      title={RelayLLM: Efficient Reasoning via Collaborative Decoding}, 
      author={Chengsong Huang and Tong Zheng and Langlin Huang and Jinyuan Li and Haolin Liu and Jiaxin Huang},
      year={2026},
      eprint={2601.05167},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.05167}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
RL_stage		RL_stage
SFT_stage		SFT_stage
eval		eval
figs		figs
utils		utils
example.bash		example.bash
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelayLLM: Efficient Reasoning via Collaborative Decoding

🔥 Updates

🧩 Overview

Key Features

⚡️ Quickstart Guide

1. Configure Environment and Prepare Dirs

2. Configure Environment and Prepare Dirs

📊 Impressive Results

🙏 Acknowledgements

💬 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RelayLLM: Efficient Reasoning via Collaborative Decoding

🔥 Updates

🧩 Overview

Key Features

⚡️ Quickstart Guide

1. Configure Environment and Prepare Dirs

2. Configure Environment and Prepare Dirs

📊 Impressive Results

🙏 Acknowledgements

💬 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages