Skip to content

feat(ci): add wiki-search integration test#2536

Draft
mikasenghaas wants to merge 5 commits into
mainfrom
feat/wiki-search-integration-test
Draft

feat(ci): add wiki-search integration test#2536
mikasenghaas wants to merge 5 commits into
mainfrom
feat/wiki-search-integration-test

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented May 18, 2026

Summary

  • Adds a minimal wiki-search integration test to the CI matrix to cover tool call environments, which are currently untested
  • New CI config at configs/ci/integration/wiki_search.toml: 5 steps, batch size 32, PrimeIntellect/Qwen3-0.6B, with tool_call_parser = "hermes"
  • Test checks that the process completes without error and mismatch KL stays below 0.15
  • Added as a vm runner job in .github/workflows/gpu_tests.yaml alongside alphabet_sort

Notes

PrimeIntellect/Qwen3-0.6B is not in MODEL_RENDERER_MAP in the renderers submodule, so it falls back to DefaultRenderer which has supports_tools=False unless tool_parser is explicitly set. The CI config therefore sets both [orchestrator.renderer] tool_parser = "hermes" (renderer-side) and [inference.model] tool_call_parser = "hermes" (vLLM server-side).

mikasenghaas and others added 5 commits May 18, 2026 03:20
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PrimeIntellect/Qwen3-0.6B is not in MODEL_RENDERER_MAP so it falls back
to DefaultRenderer, which has supports_tools=False unless tool_parser is
explicitly set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hermes is the vLLM server-side parser name; the renderer uses qwen3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ero-adv filter

- max_turns=2: caps sequential API calls per rollout (embedding + judge)
- rollouts_per_example=4: 8 unique examples per batch, better reward variance
- zero_advantage enforce=false: avoids retry cycles when all rewards tie
- max_completion_tokens=512: cap completion length

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant