Add Strands SDK integration for RAG agent training#359
Open
JunjieAraoXiong wants to merge 2 commits intorllm-org:mainfrom
Open
Add Strands SDK integration for RAG agent training#359JunjieAraoXiong wants to merge 2 commits intorllm-org:mainfrom
JunjieAraoXiong wants to merge 2 commits intorllm-org:mainfrom
Conversation
722c248 to
9467622
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Aim
This PR adds Strands SDK support to rLLM as an alternative to LangGraph for training RAG agents. Strands uses a simpler
@tooldecorator model rather than LangGraph’s graph-based orchestration, which makes the agent definition more lightweight and easier to reason about. The implementation is based on the existing LangGraph RAG example inexamples/sdk/langgraph/and the documentation at https://rllm-project.readthedocs.io/en/latest/examples/sdk_langgraph_rag/. It also references the Strands tools repository (https://github.com/strands-agents/tools) for compatibility and design alignment.Changes
This PR introduces a new
examples/strands/directory containing a full Strands-based RAG training pipeline. The main agent implementation (search_agent_strands.py) uses a customNonStreamingOpenAIModelwrapper to ensure LiteLLM trace capture works correctly, enforces tool-turn budgeting, and handles streaming event conversion. The retrieval layer (retrieve_tool.py) implements async RAG withhttpxconnection pooling. The training entry point (train_strands_agent.py) integrates HotpotQA withRewardSearchFn, andtrain_strands_agent.shprovides a Hydra config for 8xGPU RLOO training in fp16. The RAG backend includes an auto-batching FastAPI server (rag/rag_server.py) with multi-GPU FAISS support and a launch script (rag/launch_rag.sh). A README is included for setup and usage.Legacy Strands files (
run_strands.py,strands_workflow.py,gsearch_tool_wrapped.py,.env.example, and theeval/directory) were removed and replaced by this cleaner implementation.Bug Fixes
This PR also fixes a Qwen3 tool-calling issue in
rllm/integrations/strands.pyand filters out Strands-specific kwargs insiderllm/engine/rollout/openai_engine.pyto prevent unintended argument propagation.Design Decisions
Strands hardcodes
stream=True, but LiteLLM’sasync_post_call_success_hookonly fires for non-streaming requests. To preserve tracing compatibility, aNonStreamingOpenAIModelsubclass forcesstream=Falseand convertsChatCompletionresponses into StrandsStreamEventformat. Tool-turn budgeting is enforced usingnum_tool_turnsbecause Strands consumesmessageStopevents internally before they reach the event loop. The RAG server uses request auto-batching for embeddings and FAISS search to improve throughput during concurrent rollouts.Results
Training was conducted using RLOO on Qwen3-4B with 8x H100 GPUs on the HotpotQA dataset. The Strands implementation achieved a +15pp improvement in pass@1 compared to the LangGraph baseline.
Testing
Local testing with the RAG server works as expected. Trajectories are saved correctly, and multi-GPU training completed successfully with positive results.