Skip to content

NVIDIA-NeMo/ProRL-Agent-Server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ProRLAgent Server: A Scalable Multi-turn Rollout Infrastructure for RL Agents Training

logo

codecov Python 3.10+ GitHub Stars

☁️ Introduction

ProRLAgent Server is a scalable multi-turn rollout system for training and evaluating RL agents. Built on top of OpenHands, it offers high concurrency and a pluggable handler interface to support diverse agent tasks.

  • Decoupled RL Training & Rollouts: rollouts run as a service; any RL trainer can consume the outputs.
  • High concurrency: execute large-scale jobs with LLM load balancing.
  • Pluggable AgentHandler: customize for different tasks and agents.
  • Lifecycle management: built-in support for status tracking, queuing, timeouts, and cleanup.
  • Token-in / Token-out: communicate in tokens to maintain turn alignment and ensure stable training.
  • Singularity runtime: rootless execution with single-file containers (.sif), seamless Slurm integration, secure multi-user support.
  • Efficient Bash tool: ptyprocess-based implementation for 6x speed improvements over tmux-based approach.
  • Efficient IPython tool: direct IPython kernel integration without network overhead.
  • UDS communication: Unix domain sockets for better throughput and isolation.

💻 Quick Start

  1. Install dependencies
  • Install OpenHands Dependencies
poetry install --with dev,test,runtime,evaluation
pip install git+https://github.com/SWE-Gym/SWE-Bench-Package.git
pip install git+https://github.com/R2E-Gym/R2E-Gym.git
  • Install Singularity/Apptainer Sandbox
sudo apt-get update
sudo apt-get install -y software-properties-common curl gnupg
sudo apt-get install -y singularity-container fuse
sudo add-apt-repository -y ppa:apptainer/ppa
sudo apt-get update
sudo apt-get install -y apptainer
  1. Start the VLLM server with your desired Hugging Face model:
vllm serve path/to/your/model --enable-auto-tool-choice --tool-call-parser hermes  --host 127.0.0.1 --port 8000 --api-key key --served-model-name model_name &

Replace path/to/your/model with the actual path to your Hugging Face model. Set up the server IP, Port, and model name.

  1. Pull singularity sandboxs for swe tasks
python scripts/pull_swe_images.py --parquet-file /path/to/train.parquet --dest-dir /some/dir --temp-base /some/dir --log-name log

Download parquet data from Huggingface. Supported Training data:

  1. Start the async evaluation server (FastAPI)

This command starts the FastAPI-based async evaluation server and listens on the given host/port. It exposes /start, /process, and /status endpoints, and uses --max-init-workers/--max-run-workers and --timeout to control concurrency and time limits.

python scripts/start_server.py --host 0.0.0.0 --port 8006 --max-init-workers 64 --max-run-workers 64 --timeout 300
  1. Test the server (HTTP I/O)

Before sending jobs to /process, make sure you follow this sequence (assumes you already started a VLLM server in step 2):

  1. Register at least one LLM server address (include /v1):
curl -X POST http://localhost:8006/add_llm_server \
  -H 'Content-Type: application/json' \
  -d '{"address":"http://127.0.0.1:8000/v1"}'
  1. Start the worker process:
curl -X POST http://localhost:8006/start
  1. (Optional) Check status:
curl http://localhost:8006/status

Notes:

  • You can call /add_llm_server before /start; the address will be buffered and applied when the worker starts.
  • Ensure the sampling_params.model and api_key in your request match the model name and key you used when launching VLLM in step 2.

Option 1: Quick test using the built-in script

python scripts/tests/test_server.py

Option 2: Test using curl

Quick try: send a task to /process and read the JSON result.

Input (request body):

  • instance: the task info (must include data_source and any fields your handler needs)
  • sampling_params: optional LLM/agent settings (e.g., temperature, top_p, max_tokens)
  • job_id (optional): your own identifier

Example:

curl -X POST http://localhost:8006/process \
  -H 'Content-Type: application/json' \
  -d '{
    "instance": {
      "data_source": "swebench",
      "instance_id": "python__mypy-16203",
      "trajectory_id": "t0",
      "patch": "",
      "metadata": {}
    },
    "sampling_params": {
      "model": "hosted_vllm/Qwen2.5-7B-Instruct",
      "api_key": "key",
      "modify_params": false,
      "log_completions": true,
      "native_tool_calling": false,
      "temperature": 0.6,
      "top_p": 0.9,
      "token_level_generation": true,
      "custom_tokenizer": "tokenizer_path",
      "max_iterations": 5
    }
  }'

Output (response body):

{
  "resolved": true,
  "report": {"pass@1": 0.0, "details": {"...": "..."}},
  "timing": {"init": 2.1, "run": 41.3, "eval": 5.2, "others": 1.4, "timeout": 300.0}
}

💻 Add a New Task/Handler

To add a new task:

  • Implement an AgentHandler with name, init(job_details, ...), run(job_details, ...), and eval(job_details, ...).
  • Register it in the registry so that instance["data_source"] == name routes requests to your handler.
  • Provide a final_result(job_details) function for result shaping.
  • Ensure your handler returns a consistent result schema and handles timeouts/errors.

Minimal sketch:

from openhands.nvidia.registry import AgentHandler, register_agent_handler

class MyTaskHandler(AgentHandler):
    @property
    def name(self) -> str: return "my_task"
    async def init(self, job_details, sid=None, **kwargs):
        return runtime, metadata, config
    async def run(self, job_details, sid=None, **kwargs):
        return {"git_patch": "...", "messages": []}
    async def eval(self, job_details, sid=None, allow_skip=True, reward=None):
        return {"report": {"resolved": True}}

register_agent_handler(MyTaskHandler())

Then submit requests with {"data_source": "my_task", ...} in the instance.

💻 Run unit tests

Example:

TEST_RUNTIME=singularity RUN_AS_OPENHANDS=False PYTHONPATH='.' pytest tests/runtime/test_browsing.py -v -s

Important Environment Variables

Image Storage Location

OH_RUNTIME_SINGULARITY_IMAGE_REPO - Specifies the directory where Singularity runtime images will be stored.

OH_RUNTIME_SINGULARITY_IMAGE_REPO=/path/to/singularity_images

📄 Documentation

More module READMEs (click to open):

💡 Current Results

To validate the functionality of the ProRLAgent servers, we conducted experiments on software engineering (SWE) tasks by integrating the server with our ProRLAgent Training framework based on verl. We did some initial RL training on Qwen3-4B-Instruct-2507 model. We used 32 A100 GPUs to train the model. Our training data is a subset of SWE-GYM with 293 training examples. Training for around 66 steps have allowed the Pass@1 on SWE-Bench-Verified to be improved from 14.2% to 20.8%,the following charts shows the test results on SWE-Bench-Verified. It increases during training. swe-bench curve

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published