Skip to content

Commit 810daa2

Browse files
Updated OpenEnv docs (#4418)
1 parent e39d852 commit 810daa2

File tree

4 files changed

+411
-271
lines changed

4 files changed

+411
-271
lines changed

docs/source/openenv.md

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In this guide, we’ll focus on **how to integrate OpenEnv with TRL**, but feel
1111
To use OpenEnv with TRL, install the framework:
1212

1313
```bash
14-
pip install openenv-core
14+
pip install git+https://github.com/meta-pytorch/OpenEnv.git
1515
```
1616

1717
## Using `rollout_func` with OpenEnv environments
@@ -65,6 +65,33 @@ By using OpenEnv in this loop, you can:
6565
* Plug in custom simulators, web APIs, or evaluators as environments.
6666
* Pass structured reward signals back into RL training seamlessly.
6767

68+
## Running the Environments
69+
70+
You can run OpenEnv environments in three different ways:
71+
72+
1. **Local Docker container** *(recommended)*
73+
74+
To start a Docker container:
75+
* Open the environment on the Hugging Face Hub.
76+
* Click the **⋮ (three dots)** menu.
77+
* Select **“Run locally.”**
78+
* Copy and execute the provided command in your terminal.
79+
80+
Example:
81+
```bash
82+
docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest
83+
```
84+
![open_env_launch_docker](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/open_env_launch_docker.png)
85+
2. **Local Python process**: Launch the environment directly using Uvicorn.
86+
You can start the server manually as a local process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).
87+
```bash
88+
python -m uvicorn envs.echo_env.server.app:app --host 0.0.0.0 --port 8001
89+
```
90+
3. **Hugging Face Spaces**: Connect to a hosted environment running on the Hugging Face Hub.
91+
To find the connection URL, open the Space page, click the **⋮ (three dots)** menu, and select **“Embed this Space.”**
92+
You can then use that URL to connect directly from your client.
93+
Keep in mind that public Spaces may have rate limits or temporarily go offline if inactive.
94+
6895
## A simple example
6996

7097
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
@@ -75,6 +102,15 @@ from trl import GRPOConfig, GRPOTrainer
75102
76103
# Create HTTP client for Echo Environment
77104
client = EchoEnv.from_docker_image("echo-env:latest")
105+
"""
106+
Alternatively, you can start the environment manually with Docker and connect to it:
107+
108+
# Step 1: Start the Echo environment
109+
docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest
110+
111+
# Step 2: Connect the client to the running container
112+
client = EchoEnv(base_url="http://0.0.0.0:8001")
113+
"""
78114
79115
def rollout_func(prompts, args, processing_class):
80116
# 1. Generate completions via vLLM inference server (running on port 8000)
@@ -151,6 +187,21 @@ CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-0.5B-Instruct --host
151187
CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/echo.py
152188
```
153189
190+
Alternatively, you can manually start the Echo environment in a Docker container before running the training:
191+
192+
```bash
193+
# Launch the Echo environment
194+
docker run -d -p 8001:8001 registry.hf.space/openenv-echo-env:latest
195+
```
196+
197+
Then, initialize the client using:
198+
199+
`client = EchoEnv(base_url="http://0.0.0.0:8001")`
200+
201+
instead of:
202+
203+
`client = EchoEnv.from_docker_image("echo-env:latest")`.
204+
154205
Below is the reward curve from training:
155206
156207
<iframe src="https://trl-lib-trackio.hf.space?project=openenv&metrics=train/rewards/reward_from_env/mean&runs=qgallouedec-1761202871&sidebar=hidden&navbar=hidden" style="width:600px; height:500px; border:0;"></iframe>
@@ -352,7 +403,7 @@ trainer = GRPOTrainer(
352403
trainer.train()
353404
```
354405
355-
### Running the Example
406+
### Running the Advanced Example
356407
357408
The example requires two GPUs:
358409
@@ -364,6 +415,17 @@ CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen3-1.7B --host 0.0.0.0 --p
364415
CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/wordle.py
365416
```
366417
418+
Again, you can manually start the TextArena environment in a Docker container before running the training.
419+
In this case, initialize the client with
420+
`client = TextArenaEnv(base_url="http://0.0.0.0:8001")`
421+
instead of
422+
`client = TextArenaEnv.from_docker_image("registry.hf.space/burtenshaw-textarena:latest")`:
423+
424+
```bash
425+
# Launch the TextArena environment
426+
docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest
427+
```
428+
367429
### Results
368430
369431
The resulting model improves it's performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters.

0 commit comments

Comments
 (0)