ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

ProRLAgent Server is a scalable multi-turn rollout system for training and evaluating RL agents. Built on top of OpenHands, it offers high concurrency and a pluggable handler interface to support diverse agent tasks.

Decoupled RL Training & Rollouts: rollouts run as a service; any RL trainer can consume the outputs.
High concurrency: execute large-scale jobs with LLM load balancing.
Pluggable AgentHandler: customize for different tasks and agents.
Lifecycle management: built-in support for status tracking, queuing, timeouts, and cleanup.
Token-in / Token-out: communicate in tokens to maintain turn alignment and ensure stable training.
Singularity runtime: rootless execution with single-file containers (.sif), seamless Slurm integration, secure multi-user support.
Efficient Bash tool: ptyprocess-based implementation for 6x speed improvements over tmux-based approach.
Efficient IPython tool: direct IPython kernel integration without network overhead.
UDS communication: Unix domain sockets for better throughput and isolation.

💻 Quick Start

Install dependencies

Install OpenHands Dependencies

poetry install --with dev,test,runtime,evaluation
pip install git+https://github.com/SWE-Gym/SWE-Bench-Package.git
pip install git+https://github.com/R2E-Gym/R2E-Gym.git

Install Singularity/Apptainer Sandbox

sudo apt-get update
sudo apt-get install -y software-properties-common curl gnupg
sudo apt-get install -y singularity-container fuse
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt-get update
sudo apt-get install -y apptainer

Start the VLLM server with your desired Hugging Face model:

vllm serve path/to/your/model --enable-auto-tool-choice --tool-call-parser hermes  --host 127.0.0.1 --port 8000 --api-key key --served-model-name model_name &

Replace path/to/your/model with the actual path to your Hugging Face model. Set up the server IP, Port, and model name.

Pull singularity sandboxs for swe tasks

python scripts/pull_swe_images.py --parquet-file /path/to/train.parquet --dest-dir /some/dir --temp-base /some/dir --log-name log

Download parquet data from Huggingface. Supported Training data:

swe-gym: https://huggingface.co/datasets/NovaSky-AI/SkyRL-v0-293-data
r2egym: https://huggingface.co/R2E-Gym
swe-bench-multimodal: https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal
swe-bench: https://huggingface.co/datasets/SWE-bench/SWE-bench
swe-smith: https://huggingface.co/datasets/SWE-bench/SWE-smith

Start the async evaluation server (FastAPI)

This command starts the FastAPI-based async evaluation server and listens on the given host/port. It exposes /start, /process, and /status endpoints, and uses --max-init-workers/--max-run-workers and --timeout to control concurrency and time limits.

python scripts/start_server.py --host 0.0.0.0 --port 8006 --max-init-workers 64 --max-run-workers 64 --timeout 300

Test the server (HTTP I/O)

Before sending jobs to /process, make sure you follow this sequence (assumes you already started a VLLM server in step 2):

Register at least one LLM server address (include /v1):

curl -X POST http://localhost:8006/add_llm_server \
  -H 'Content-Type: application/json' \
  -d '{"address":"http://127.0.0.1:8000/v1"}'

Start the worker process:

curl -X POST http://localhost:8006/start

(Optional) Check status:

curl http://localhost:8006/status

Notes:

You can call /add_llm_server before /start; the address will be buffered and applied when the worker starts.
Ensure the sampling_params.model and api_key in your request match the model name and key you used when launching VLLM in step 2.

Option 1: Quick test using the built-in script

python scripts/tests/test_server.py

Option 2: Test using curl

Quick try: send a task to /process and read the JSON result.

Input (request body):

instance: the task info (must include data_source and any fields your handler needs)
sampling_params: optional LLM/agent settings (e.g., temperature, top_p, max_tokens)
job_id (optional): your own identifier

Example:

curl -X POST http://localhost:8006/process \
  -H 'Content-Type: application/json' \
  -d '{
    "instance": {
      "data_source": "swebench",
      "instance_id": "python__mypy-16203",
      "trajectory_id": "t0",
      "patch": "",
      "metadata": {}
    },
    "sampling_params": {
      "model": "hosted_vllm/Qwen2.5-7B-Instruct",
      "api_key": "key",
      "modify_params": false,
      "log_completions": true,
      "native_tool_calling": false,
      "temperature": 0.6,
      "top_p": 0.9,
      "token_level_generation": true,
      "custom_tokenizer": "tokenizer_path",
      "max_iterations": 5
    }
  }'

Output (response body):

{
  "resolved": true,
  "report": {"pass@1": 0.0, "details": {"...": "..."}},
  "timing": {"init": 2.1, "run": 41.3, "eval": 5.2, "others": 1.4, "timeout": 300.0}
}

💻 Add a New Task/Handler

To add a new task:

Implement an AgentHandler with name, init(job_details, ...), run(job_details, ...), and eval(job_details, ...).
Register it in the registry so that instance["data_source"] == name routes requests to your handler.
Provide a final_result(job_details) function for result shaping.
Ensure your handler returns a consistent result schema and handles timeouts/errors.

Minimal sketch:

from openhands.nvidia.registry import AgentHandler, register_agent_handler

class MyTaskHandler(AgentHandler):
    @property
    def name(self) -> str: return "my_task"
    async def init(self, job_details, sid=None, **kwargs):
        return runtime, metadata, config
    async def run(self, job_details, sid=None, **kwargs):
        return {"git_patch": "...", "messages": []}
    async def eval(self, job_details, sid=None, allow_skip=True, reward=None):
        return {"report": {"resolved": True}}

register_agent_handler(MyTaskHandler())

Then submit requests with {"data_source": "my_task", ...} in the instance.

💻 Run unit tests

Example:

TEST_RUNTIME=singularity RUN_AS_OPENHANDS=False PYTHONPATH='.' pytest tests/runtime/test_browsing.py -v -s

Important Environment Variables

Image Storage Location

OH_RUNTIME_SINGULARITY_IMAGE_REPO - Specifies the directory where Singularity runtime images will be stored.

OH_RUNTIME_SINGULARITY_IMAGE_REPO=/path/to/singularity_images

📄 Documentation

💡 Current Results

To validate the functionality of the ProRLAgent servers, we conducted experiments on software engineering (SWE) tasks by integrating the server with our ProRLAgent Training framework based on verl. We did some initial RL training on Qwen3-4B-Instruct-2507 model. We used 32 A100 GPUs to train the model. Our training data is a subset of SWE-GYM with 293 training examples. Training for around 66 steps have allowed the Pass@1 on SWE-Bench-Verified to be improved from 14.2% to 20.8%，the following charts shows the test results on SWE-Bench-Verified. It increases during training.

Name		Name	Last commit message	Last commit date
Latest commit History 4,396 Commits
.devcontainer		.devcontainer
.github		.github
.openhands		.openhands
NVIDIA_Assets		NVIDIA_Assets
containers		containers
dev_config/python		dev_config/python
docs		docs
evaluation		evaluation
microagents		microagents
openhands		openhands
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.nspect-allowlist.toml		.nspect-allowlist.toml
.nvmrc		.nvmrc
CHANGE_LOG.md		CHANGE_LOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
Makefile.singularity		Makefile.singularity
README.md		README.md
SECURITY.md		SECURITY.md
build.sh		build.sh
config.template.toml		config.template.toml
docker-compose.yml		docker-compose.yml
example.config.toml		example.config.toml
package-lock.json		package-lock.json
poetry.lock		poetry.lock
pydoc-markdown.yml		pydoc-markdown.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

💻 Quick Start

💻 Add a New Task/Handler

💻 Run unit tests

Important Environment Variables

Image Storage Location

📄 Documentation

💡 Current Results

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

NVIDIA-NeMo/ProRL-Agent-Server

Folders and files

Latest commit

History

Repository files navigation

ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

☁️ Introduction

💻 Quick Start

💻 Add a New Task/Handler

💻 Run unit tests

Important Environment Variables

Image Storage Location

📄 Documentation

💡 Current Results

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages