You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-9Lines changed: 14 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,10 @@
12
12
---
13
13
14
14
## Updates
15
+
* 21/03/2025: We incorporate [Dr. GRPO](https://github.com/sail-sg/understand-r1-zero), which fixes the optimization bias in GRPO.
15
16
* 26/01/2025: We support reinforcement learning with verifiable rewards (RLVR) for math reasoning.
16
17
* A quick [example](https://github.com/sail-sg/oat/blob/main/docs/reasoning_examples.md#deepseek-r1-zero-like-training) of R1-Zero-like training with GRPO.
17
-
18
+
* 20/10/2024: We open source Oat, an online LLM alignment framework developed during a research project on online LLM exploration ([sample-efficient alignment](https://arxiv.org/pdf/2411.01493)).
18
19
## Introduction
19
20
20
21
Oat 🌾 is a simple yet efficient framework for running **online** LLM alignment algorithms. Its key features include:
@@ -31,12 +32,12 @@ Oat 🌾 is a simple yet efficient framework for running **online** LLM alignmen
31
32
* LLM-as-a-judge is supported via querying OpenAI API for model-based pairwise ranking.
32
33
***Ease of Use**: Oat's modular structure allows researchers to easily inherit and modify existing classes, enabling rapid prototyping and experimentation with new algorithms.
* Online DPO/SimPO/IPO for online preference learning.
36
37
* Online exploration (active alignment) algorithms, including [SEA](https://arxiv.org/abs/2411.01493), APL and XPO.
37
38
38
39
## Installation
39
-
In a python environment with supported versions (`>=3.8, <=3.10`), you could install oat via PyPI:
40
+
In a python environment with supported versions (we recommend `3.10`), you could install oat via PyPI:
40
41
```shell
41
42
pip install vllm==0.7.2 && pip install oat-llm
42
43
```
@@ -65,16 +66,20 @@ The benchmarking compares oat with the online DPO implementation from [huggingfa
65
66
Please refer to [Appendix C of our paper](https://arxiv.org/pdf/2411.01493#page=17.64) for a detailed discussion of the benchmarking methods and results.
66
67
67
68
## Citation
68
-
If you find this codebase useful for your research, please consider citing
69
+
If you find this codebase useful for your research, please consider citing:
70
+
71
+
LLM online alignment framework:
69
72
```
70
-
@misc{liu2025oat,
71
-
author = {Zichen Liu and Changyu Chen and Chao Du and Wee Sun Lee and Min Lin},
72
-
title = {OAT: A research-friendly framework for LLM online alignment},
0 commit comments