[fix] wordle model_id updates (#4453)

burtenshaw · web-flow · commit e39d852294ce · 2025-11-05T10:04:55.000+01:00
diff --git a/docs/source/openenv.md b/docs/source/openenv.md
@@ -358,7 +358,7 @@ The example requires two GPUs:
 
 ```bash
 # Terminal 1: Start vLLM inference server
-CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-0.5B-Instruct --host 0.0.0.0 --port 8000
+CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen3-1.7B --host 0.0.0.0 --port 8000
 
 # Terminal 2: Run GRPO training with OpenEnv
 CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/wordle.py
@@ -368,6 +368,6 @@ CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/wordle.py
 
 The resulting model improves it's performance on the game, both by reducing the number of repetitions and by increasing the number of correct guesses. However, the the Qwen3-1.7B model we trained is not able to consistently win the game. The following reward curve shows the coverage of the model's guesses and the coverage of correct Y and G letters.
 
-<iframe src="https://burtenshaw-wordle-grpo.hf.space/?project=group-Qwen-Qwen3-17B&metrics=train/rewards/reward_coverage/mean&runs=run-2025-10-26_09-39-49&sidebar=hidden&navbar=hidden" style="width:600px; height:500px; border:0;"></iframe>
+<iframe src="https://burtenshaw-wordle-grpo.hf.space?project=group-Qwen-Qwen3-17B&metrics=reward&runs=run-2025-10-26_09-39-49,run-2025-10-26_08-04-49&sidebar=hidden&navbar=hidden" style="width:1600px; height:500px; border:0;"></iframe>
 
 We experimented larger models like `gpt-oss-20b` and found that model was able to consistently win the game. However, this requires a lot of compute to train and the model. Why not try this out yourself?
diff --git a/examples/scripts/openenv/wordle.py b/examples/scripts/openenv/wordle.py
@@ -21,7 +21,7 @@
         python -m src.envs.textarena_env.server.app
 
     # Start the vLLM server with your model
-    CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-0.5B-Instruct --host 0.0.0.0 --port 8000
+    CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen3-1.7B --host 0.0.0.0 --port 8000
 
     # Then run this training script:
     CUDA_VISIBLE_DEVICES=1 python examples/scripts/openenv/wordle.py
@@ -66,7 +66,7 @@ def parse_args() -> argparse.Namespace:
     )
     parser.add_argument(
         "--model-id",
-        default="willcb/Qwen3-1.7B-Wordle",
+        default="Qwen/Qwen3-1.7B",
         help="Model identifier passed to GRPOTrainer for fine-tuning.",
     )
     parser.add_argument(