Add Strands SDK integration for RAG agent training by JunjieAraoXiong · Pull Request #359 · rllm-org/rllm

JunjieAraoXiong · 2025-12-31T07:48:21Z

Aim

This PR adds Strands SDK support to rLLM as an alternative to LangGraph for training RAG agents. Strands uses a simpler @tool decorator model rather than LangGraph’s graph-based orchestration, which makes the agent definition more lightweight and easier to reason about. The implementation is based on the existing LangGraph RAG example in examples/sdk/langgraph/ and the documentation at https://rllm-project.readthedocs.io/en/latest/examples/sdk_langgraph_rag/. It also references the Strands tools repository (https://github.com/strands-agents/tools) for compatibility and design alignment.

Changes

This PR introduces a new examples/strands/ directory containing a full Strands-based RAG training pipeline. The main agent implementation (search_agent_strands.py) uses a custom NonStreamingOpenAIModel wrapper to ensure LiteLLM trace capture works correctly, enforces tool-turn budgeting, and handles streaming event conversion. The retrieval layer (retrieve_tool.py) implements async RAG with httpx connection pooling. The training entry point (train_strands_agent.py) integrates HotpotQA with RewardSearchFn, and train_strands_agent.sh provides a Hydra config for 8xGPU RLOO training in fp16. The RAG backend includes an auto-batching FastAPI server (rag/rag_server.py) with multi-GPU FAISS support and a launch script (rag/launch_rag.sh). A README is included for setup and usage.

Legacy Strands files (run_strands.py, strands_workflow.py, gsearch_tool_wrapped.py, .env.example, and the eval/ directory) were removed and replaced by this cleaner implementation.

Bug Fixes

This PR also fixes a Qwen3 tool-calling issue in rllm/integrations/strands.py and filters out Strands-specific kwargs inside rllm/engine/rollout/openai_engine.py to prevent unintended argument propagation.

Design Decisions

Strands hardcodes stream=True, but LiteLLM’s async_post_call_success_hook only fires for non-streaming requests. To preserve tracing compatibility, a NonStreamingOpenAIModel subclass forces stream=False and converts ChatCompletion responses into Strands StreamEvent format. Tool-turn budgeting is enforced using num_tool_turns because Strands consumes messageStop events internally before they reach the event loop. The RAG server uses request auto-batching for embeddings and FAISS search to improve throughput during concurrent rollouts.

Results

Training was conducted using RLOO on Qwen3-4B with 8x H100 GPUs on the HotpotQA dataset. The Strands implementation achieved a +15pp improvement in pass@1 compared to the LangGraph baseline.

Testing

Local testing with the RAG server works as expected. Trajectories are saved correctly, and multi-GPU training completed successfully with positive results.

Add Strands SDK integration for RAG agent training

4ff8070

JunjieAraoXiong marked this pull request as ready for review December 31, 2025 07:54

JunjieAraoXiong marked this pull request as draft December 31, 2025 07:54

Update Strands example: rewrite agent, add RAG server, tutorial README

9467622

JunjieAraoXiong force-pushed the strands-sdk-integration branch from 722c248 to 9467622 Compare February 27, 2026 08:06

JunjieAraoXiong marked this pull request as ready for review February 27, 2026 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Strands SDK integration for RAG agent training#359

Add Strands SDK integration for RAG agent training#359
JunjieAraoXiong wants to merge 2 commits intorllm-org:mainfrom
JunjieAraoXiong:strands-sdk-integration

JunjieAraoXiong commented Dec 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JunjieAraoXiong commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aim

Changes

Bug Fixes

Design Decisions

Results

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JunjieAraoXiong commented Dec 31, 2025 •

edited

Loading