Still exploring this, but it looks like the default use_renderer = true in the orchestrator causes silent failures for any environment using vf.ToolEnv (e.g. wiki_search, wordle).
The TITO rollout path returns ParsedToolCall dataclass instances, but ToolEnv expects dict-shaped tool calls and tries to subscript them (tool_call["function"]["name"]). This raises TypeError on every tool-using rollout. The orchestrator catches it, retries, and training continues - but only on trajectories where the model didn't use tools.
vf-eval doesn't surface the problem because eval rollouts always go through openai_chat_completions regardless of the flag.
Setting use_renderer = false in rl.toml fixes it.
Possibly related: #1196 hit a similar object-vs-dict mismatch in tool calls, though that one crashed hard rather than failing silently.
Still exploring this, but it looks like the default
use_renderer = truein the orchestrator causes silent failures for any environment usingvf.ToolEnv(e.g.wiki_search,wordle).The TITO rollout path returns
ParsedToolCalldataclass instances, butToolEnvexpects dict-shaped tool calls and tries to subscript them (tool_call["function"]["name"]). This raisesTypeErroron every tool-using rollout. The orchestrator catches it, retries, and training continues - but only on trajectories where the model didn't use tools.vf-evaldoesn't surface the problem because eval rollouts always go throughopenai_chat_completionsregardless of the flag.Setting
use_renderer = falseinrl.tomlfixes it.Possibly related: #1196 hit a similar object-vs-dict mismatch in tool calls, though that one crashed hard rather than failing silently.