Skip to content

[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API#1396

Open
nithinvc wants to merge 8 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/inf-sample
Open

[skyrl] Add /sample endpoint to RemoteInferenceClient following Tinker API#1396
nithinvc wants to merge 8 commits intoNovaSky-AI:mainfrom
nithinvc:nithinc/inf-sample

Conversation

@nithinvc
Copy link
Copy Markdown
Contributor

@nithinvc nithinvc commented Mar 26, 2026

Add /sample API to RemoteInferenceClient

This PR adds the tinker compatible /sample API to RemoteInferenceClient on the new inference server codepath, addressing #1286 .

Changes

  • Add RemoteInferenceClient.sample() method that maps Tinker-style sample requests to the vLLM /inference/v1/generate endpoint, supporting n completions, logprobs, and configurable sampling params (temperature, top_k, top_p, seed, stop tokens, etc.)

Tests

  • Add unit tests (TestSample) covering n=1, n=2, and multi-chunk prompts
  • Add GPU integration tests (test_client_sample, test_client_sample_multiple, test_client_sample_deterministic) validating end-to-end generation against a live vLLM server

Open with Devin

- Add RemoteInferenceClient.sample() mapping Tinker-style sample requests
  to the vLLM /inference/v1/generate endpoint
- Support n completions, logprobs, and configurable sampling params
- Add unit tests (n=1, n=2, session_id routing)
- Add GPU integration tests (sample, sample_multiple, sample_deterministic)
- Simplify _force_close_connector to use transport.close() directly
@nithinvc nithinvc force-pushed the nithinc/inf-sample branch from 55bc8e7 to 929e25b Compare March 27, 2026 19:53
@nithinvc nithinvc marked this pull request as ready for review March 27, 2026 20:31
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new sample method to RemoteInferenceClient to support the Tinker API, along with corresponding unit tests and updates to the mock inference server. I have provided feedback regarding the optimization of the _PARAM_MAP constant, the need for a test case covering session_id routing, and a correction for the num_choices logic in the mock server.

Comment on lines +672 to +681
def test_client_sample_deterministic(vllm_server: InferenceEngineState):
"""Test that sample with seed + temperature=0 is deterministic across calls."""
client = vllm_server.client
token_ids = _get_test_token_ids(MODEL_QWEN2_5)
params = {"temperature": 0.0, "max_tokens": 32, "seed": 42}

result1 = asyncio.run(client.sample(_build_sample_payload(token_ids, num_samples=1, sampling_params=params)))
result2 = asyncio.run(client.sample(_build_sample_payload(token_ids, num_samples=1, sampling_params=params)))

assert result1["sequences"][0]["tokens"] == result2["sequences"][0]["tokens"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pull request description mentions adding a unit test for session_id routing for the sample method, but it seems to be missing from the submitted tests. Please consider adding a test case that utilizes the session_id parameter in _build_sample_payload to verify that session-based routing works as expected for the new endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the arg in for _build_sample_payload since we may want to test it in the future. I'm not sure how to test session based routing in our current setup, so leaving for now.

…client.py


revert change

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.


# Transform response choices → sequences
sequences = []
logger.info("num choices: %d", len(response.get("choices", [])))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always logging with info here is probably a little too verbose, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I put it in for debugging originally. It shouldn't be in & I removed it

return {
"type": "sample",
"sequences": sequences,
"prompt_logprobs": None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going forward, we might want / need to support this :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Next PR will include prompt_logprobs but I need to check how they handle prompt logprobs for vision to make sure we handle that

tinker_params = body.get("sampling_params", {})

# Flatten prompt chunks → token IDs
token_ids = [tok for chunk in prompt.get("chunks", []) for tok in chunk.get("tokens", [])]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need adaptation for multi-modal inputs going forward, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will have to be the token concatenation we talked about, so it will get replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants