[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API by nithinvc · Pull Request #1396 · NovaSky-AI/SkyRL

nithinvc · 2026-03-26T19:23:35Z

Add `/sample` API to `RemoteInferenceClient`

This PR adds the tinker compatible /sample API to RemoteInferenceClient on the new inference server codepath, addressing #1286 .

Changes

Add RemoteInferenceClient.sample() method that maps Tinker-style sample requests to the vLLM /inference/v1/generate endpoint, supporting n completions, logprobs, and configurable sampling params (temperature, top_k, top_p, seed, stop tokens, etc.)

Tests

Add unit tests (TestSample) covering n=1, n=2, and multi-chunk prompts
Add GPU integration tests (test_client_sample, test_client_sample_multiple, test_client_sample_deterministic) validating end-to-end generation against a live vLLM server

- Add RemoteInferenceClient.sample() mapping Tinker-style sample requests to the vLLM /inference/v1/generate endpoint - Support n completions, logprobs, and configurable sampling params - Add unit tests (n=1, n=2, session_id routing) - Add GPU integration tests (sample, sample_multiple, sample_deterministic) - Simplify _force_close_connector to use transport.close() directly

gemini-code-assist

Code Review

This pull request introduces a new sample method to RemoteInferenceClient to support the Tinker API, along with corresponding unit tests and updates to the mock inference server. I have provided feedback regarding the optimization of the _PARAM_MAP constant, the need for a test case covering session_id routing, and a correction for the num_choices logic in the mock server.

skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py

gemini-code-assist · 2026-03-27T20:36:17Z

tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_new_inference_generation.py

+def test_client_sample_deterministic(vllm_server: InferenceEngineState):
+    """Test that sample with seed + temperature=0 is deterministic across calls."""
+    client = vllm_server.client
+    token_ids = _get_test_token_ids(MODEL_QWEN2_5)
+    params = {"temperature": 0.0, "max_tokens": 32, "seed": 42}
+
+    result1 = asyncio.run(client.sample(_build_sample_payload(token_ids, num_samples=1, sampling_params=params)))
+    result2 = asyncio.run(client.sample(_build_sample_payload(token_ids, num_samples=1, sampling_params=params)))
+
+    assert result1["sequences"][0]["tokens"] == result2["sequences"][0]["tokens"]


The pull request description mentions adding a unit test for session_id routing for the sample method, but it seems to be missing from the submitted tests. Please consider adding a test case that utilizes the session_id parameter in _build_sample_payload to verify that session-based routing works as expected for the new endpoint.

Leaving the arg in for _build_sample_payload since we may want to test it in the future. I'm not sure how to test session based routing in our current setup, so leaving for now.

tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

…client.py revert change Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

pcmoritz · 2026-03-28T00:22:08Z

skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py

+
+        # Transform response choices → sequences
+        sequences = []
+        logger.info("num choices: %d", len(response.get("choices", [])))


Always logging with info here is probably a little too verbose, right?

Yes, I put it in for debugging originally. It shouldn't be in & I removed it

pcmoritz · 2026-03-28T00:27:05Z

skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py

+        return {
+            "type": "sample",
+            "sequences": sequences,
+            "prompt_logprobs": None,


Going forward, we might want / need to support this :)

Yes! Next PR will include prompt_logprobs but I need to check how they handle prompt logprobs for vision to make sure we handle that

pcmoritz · 2026-03-28T00:27:40Z

skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py

+        tinker_params = body.get("sampling_params", {})
+
+        # Flatten prompt chunks → token IDs
+        token_ids = [tok for chunk in prompt.get("chunks", []) for tok in chunk.get("tokens", [])]


This will need adaptation for multi-modal inputs going forward, right?

Yes, this will have to be the token concatenation we talked about, so it will get replaced.

nithinvc force-pushed the nithinc/inf-sample branch from 55bc8e7 to 929e25b Compare March 27, 2026 19:53

nithinvc and others added 4 commits March 27, 2026 12:58

remove extra changes

c401a8c

fix socket close

57f895b

remove extra tests and asserts

3b6d555

remove test

5b8d1c7

nithinvc marked this pull request as ready for review March 27, 2026 20:31

This comment was marked as resolved.

Sign in to view

gemini-code-assist bot reviewed Mar 27, 2026

View reviewed changes

Update skyrl/backends/skyrl_train/inference_servers/remote_inference_…

9af1924

…client.py revert change Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

This comment was marked as resolved.

Sign in to view

move param dict, test logic

b6fdbc1

pcmoritz reviewed Mar 28, 2026

View reviewed changes

pcmoritz approved these changes Mar 28, 2026

View reviewed changes

remove logging

5fa8d70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API#1396

[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API#1396
nithinvc wants to merge 8 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/inf-sample

nithinvc commented Mar 26, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 27, 2026

Uh oh!

nithinvc Mar 27, 2026

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz Mar 28, 2026

Uh oh!

nithinvc Mar 28, 2026

Uh oh!

pcmoritz Mar 28, 2026

Uh oh!

nithinvc Mar 28, 2026

Uh oh!

pcmoritz Mar 28, 2026

Uh oh!

nithinvc Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nithinvc commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add /sample API to RemoteInferenceClient

Changes

Tests

Uh oh!

This comment was marked as resolved.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

nithinvc Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

pcmoritz Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

nithinvc Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

pcmoritz Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

nithinvc Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

pcmoritz Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

nithinvc Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nithinvc commented Mar 26, 2026 •

edited

Loading

Add `/sample` API to `RemoteInferenceClient`