You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. **Local Python process**: Launch the environment directly using Uvicorn.
86
+
You can start the server manually as a local process. For more details about the available environments, refer to the [OpenEnv repository](https://github.com/meta-pytorch/OpenEnv/tree/main/src/envs).
3. **Hugging Face Spaces**: Connect to a hosted environment running on the Hugging Face Hub.
91
+
To find the connection URL, open the Space page, click the **⋮ (three dots)** menu, and select**“Embed this Space.”**
92
+
You can then use that URL to connect directly from your client.
93
+
Keep in mind that public Spaces may have rate limits or temporarily go offline if inactive.
94
+
68
95
## A simple example
69
96
70
97
The [echo.py](https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/echo.py) script demonstrates a minimal, end-to-end integration between TRL and OpenEnv. In this example, the Echo environment rewards completions based on their text length, encouraging the model to generate longer outputs. This pattern can be extended to any custom environment that provides structured feedback or task-based rewards:
@@ -75,6 +102,15 @@ from trl import GRPOConfig, GRPOTrainer
docker run -d -p 8001:8001 registry.hf.space/burtenshaw-textarena:latest
427
+
```
428
+
367
429
### Results
368
430
369
431
The resulting model improves it's performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters.
0 commit comments