Skip to content

Commit 5f6f6f8

Browse files
committed
Update README.md
1 parent cf96ce3 commit 5f6f6f8

1 file changed

Lines changed: 41 additions & 4 deletions

File tree

README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,50 @@
77
</div>
88

99

10+
1011
## 🔔 News
1112

1213
- **[2025-10-10]** ✨ Code is now available.
1314
- **[2025-09-30]** 📄 Our paper is released on [arXiv](https://arxiv.org/abs/2509.26628).
1415

1516

16-
## TBD
17+
18+
## 🚀 Getting Started
19+
20+
### Installation
21+
22+
Clone the repository:
23+
24+
```bash
25+
git clone https://github.com/RyanLiu112/AttnRL.git
26+
cd AttnRL
27+
```
28+
29+
Create a new conda environment and install the dependencies:
30+
31+
```bash
32+
conda create -n attnrl python=3.10
33+
conda activate attnrl
34+
bash scripts/install_vllm_sglang_mcore.sh
35+
```
36+
37+
### Data Preparation
38+
39+
The training dataset ([DeepScaleR-Preview-Dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset)) is at `data/train/deepscaler_train.parquet`, which contains `40.3k` mathematical reasoning data.
40+
The evaluation datasets are in `data/eval/` and the suffix `_${K}` indicates the number of duplicate samples for each question.
41+
42+
### Training
43+
44+
For training AttnRL with [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) backbone on 8 H100 GPUs, run:
45+
46+
```bash
47+
bash recipe/attnrl/run_attnrl_r1_distill_1.5b_8k.sh
48+
```
49+
50+
### Evaluation
51+
52+
Evaluation scripts are the same as the training scripts. `+trainer.val_only=True` should be added to perform evaluation only. We recommend setting `data.max_prompt_length=2048` and `data.max_response_length=32768`.
53+
1754

1855

1956
## 📝 Citation
@@ -23,7 +60,7 @@ If you find this work helpful, please kindly cite our paper:
2360
```bibtex
2461
@article{AttnRL,
2562
title = {Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models},
26-
author = {Liu, Runze and Wang, Jiakang and Shi, Yuling and Xie, Zhihui and An, Chenxin and Zhang, Kaiyan and Zhao, Jian and Gu, Xiaodong and Lin, Lei and Hu, Wenping and others},
63+
author = {Liu, Runze and Wang, Jiakang and Shi, Yuling and Xie, Zhihui and An, Chenxin and Zhang, Kaiyan and Zhao, Jian and Gu, Xiaodong and Lin, Lei and Hu, Wenping and Li, Xiu and Zhang, Fuzheng and Zhou, Guorui and Gai, Kun},
2764
journal = {arXiv preprint arXiv:2509.26628},
2865
year = {2025}
2966
}
@@ -33,5 +70,5 @@ If you find this work helpful, please kindly cite our paper:
3370

3471
## 💡 Acknowledgements
3572

36-
Our code is based on [verl](https://github.com/volcengine/verl) and [TreeRL](https://github.com/THUDM/TreeRL).
37-
73+
Our code is based on [verl](https://github.com/volcengine/verl) ([commit](https://github.com/volcengine/verl/commit/83ebd007e01de29bbe353de112d04245b4820b47)) and [TreeRL](https://github.com/THUDM/TreeRL).
74+
Our training dataset is from [DeepScaleR-Preview-Dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) and rule-based verifier is based on [Skywork-OR1](https://github.com/SkyworkAI/Skywork-OR1), and [Archer](https://github.com/wizard-III/ArcherCodeR).

0 commit comments

Comments
 (0)