Skip to content

Conversation

Roshankumarb31
Copy link

Summary

Adds a new /slots/status endpoint that provides secure slot monitoring without exposing sensitive user data.

Motivation

The existing /slots endpoint exposes prompts and generated text, making it unsuitable for production monitoring and load balancers. This was discussed in #11040 where the maintainer suggested creating a secure alternative.

Changes

  • Added new /slots/status endpoint in tools/server/server.cpp
  • Returns slot state and metrics without sensitive data (prompts, generated text)
  • Added API documentation in tools/server/README.md
  • Added security tests to verify no data leakage

Use Cases

  • Production monitoring dashboards
  • Load balancers (e.g., Paddler) for capacity-aware routing
  • Resource allocation and capacity planning
  • Any scenario requiring slot availability without user data exposure

Testing

  • Code compiles successfully with MSVC on Windows
  • Server starts and endpoint is accessible
  • Security tests verify no sensitive data in response

References

Adds a new endpoint that returns slot status without exposing
sensitive data like prompts or generated text.

Useful for load balancers and monitoring tools that need to
check slot availability without accessing user data.

Refs ggml-org#11040
@ggerganov
Copy link
Member

The existing /slots endpoint exposes prompts and generated text

It does not expose such data - this was fixed: #15630

@ngxson
Copy link
Collaborator

ngxson commented Oct 12, 2025

Beside, the whole PR (including docs + PR description) seems to be AI-generated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants