GitHub - nick7nlp/ConciseR: Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

Walk Before You Run!
Concise LLM Reasoning via Reinforcement Learning

🎉News

[2025/05/27] 🎉 We release ConciseR-Zero-7B and ConciseR-Zero-7B-Preview. ConciseR-Zero-7B-Preview denotes the model derived from the first training stage in ConciseR, while ConciseR-Zero-7B indicates the model derived from the second.

📑Introduction

In this work, we propose a simple yet effective two-stage reinforcement learning framework for achieving concise reasoning in LLMs, named ConciseR. Specifically, the first stage, using more training steps, aims to incentivize the model's reasoning capabilities via Group Relative Policy Optimization with clip-higher and dynamic sampling (GRPO++), and the second stage, using fewer training steps, explicitly enforces conciseness and improves efficiency via Length-aware Group Relative Policy Optimization (L-GRPO). Significantly, ConciseR only optimizes response length once all rollouts of a sample are correct, following the "walk before you run" principle. Extensive experimental results demonstrate that our ConciseR model, which generates more concise CoT reasoning responses, outperforms recent state-of-the-art reasoning models across AIME 2024, AMC 2023, MATH-500, Minerva, and Olympiad benchmarks.

✨Key Results

We report Pass@1 accuracy averaged over 32 samples for each problem.

Model	AIME 2024	MATH-500	AMC 2023	Minerva	Olympiad	Avg. Score
Qwen2.5-1.5B-Base	0.0	3.3	2.5	1.8	1.5	1.82
Qwen2.5-1.5B-Instruct	1.3	57.5	26.2	19.4	20.3	24.9
Qwen2.5-Math-1.5B-Base	11.3	51.7	44.0	11.3	26.0	28.9
Qwen2.5-Math-1.5B-Instruct	12.0	74.7	26.7	35.0	37.9	37.3
DeepSeek-R1-Distill-Qwen-1.5B	28.8	82.8	62.9	26.5	43.3	48.9
DeepScaleR-1.5B-Preview	43.1	87.8	73.6	30.2	50.0	56.9
FastCuRL-1.5B-Preview	43.1	88.0	74.2	31.6	50.4	57.5
FastCuRL-1.5B-V3	49.6	90.5	78.5	34.7	54.5	61.6

Qwen2.5-7B-Base	3.3	64.6	30.0	25.7	29.0	30.5
Qwen2.5-7B-Instruct	12.3	77.1	52.8	34.9	38.7	43.2
Qwen2.5-Math-7B-Base	20.7	64.3	56.2	17.3	29.0	37.5
Qwen2.5-Math-7B-Instruct	15.7	82.9	67.0	35.0	41.3	48.4
Eurus-2-7B-PRIME	17.8	80.1	63.0	37.5	43.9	48.5
Open-Reasoner-Zero-7B	19.7	83.9	59.5	31.6	47.6	48.5
SimpleRL-Zero-7B	14.0	77.9	58.0	33.0	39.0	44.4
SimpleRL-Zero-Math-7B	22.7	76.9	62.2	30.1	39.3	46.2
Oat-Zero-7B	28.0	79.4	66.2	34.4	43.8	50.4
ConciseR-Zero-7B-Preview (Stage-1)	42.8	83.0	73.9	31.8	45.1	55.3
ConciseR-Zero-7B (Stage-2)	43.3	83.0	76.7	31.5	46.0	56.1

🎯Getting Started

Installation

Training Data

Training Scripts

Evaluate

🎈Citation

If you find our work helpful, feel free to give us a cite.

@misc{fastcurl,
      title={Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning}, 
      author={Mingyang Song and Mao Zheng},
      year={2025},
      eprint={2505.21178},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.21178}, 
}

🌻Acknowledgements

Our model is trained on top of Qwen2.5-Math-7B-Base.
Our training experiments are powered by our heavily modified fork of verl.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
examples/data_preprocess		examples/data_preprocess
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Walk Before You Run!
Concise LLM Reasoning via Reinforcement Learning

🎉News

📑Introduction

✨Key Results

🎯Getting Started

Installation

Training Data

Training Scripts

Evaluate

🎈Citation

🌻Acknowledgements

About

Uh oh!

Releases

Packages

nick7nlp/ConciseR

Folders and files

Latest commit

History

Repository files navigation

Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

🎉News

📑Introduction

✨Key Results

🎯Getting Started

Installation

Training Data

Training Scripts

Evaluate

🎈Citation

🌻Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Walk Before You Run!
Concise LLM Reasoning via Reinforcement Learning

Packages