Skip to content

fix(ratelimit): isolate rate limiters per ModelRoute#1171

Open
nXtCyberNet wants to merge 10 commits into
volcano-sh:mainfrom
nXtCyberNet:issue/ratelimit_isolation
Open

fix(ratelimit): isolate rate limiters per ModelRoute#1171
nXtCyberNet wants to merge 10 commits into
volcano-sh:mainfrom
nXtCyberNet:issue/ratelimit_isolation

Conversation

@nXtCyberNet

Copy link
Copy Markdown
Contributor

/kind enhancement
/kind bug

What this PR does / Why we need it

Fixes a multi-tenant rate limit isolation issue where rate limiters were keyed only by model name. When multiple ModelRoutes referenced the same model, they could unintentionally share the same limiter state and Redis token bucket, causing one route's configuration to affect another.

Changes

  • Scope rate limiters using namespace/routeName instead of modelName.
  • Update router callbacks to use namespace-scoped limiter keys.
  • Update limiter add, delete, and lookup paths to use the new key.
  • Redis keys are now automatically isolated per ModelRoute.

Tests

Added unit and Redis-backed integration tests to verify:

  • Independent limiter state per ModelRoute.
  • Isolated Redis token buckets.
  • No cross-route interference when consuming tokens.

Issue

Fixes #1043

User-Facing Changes

NONE

NONE

Copilot AI review requested due to automatic review settings May 29, 2026 18:00
@volcano-sh-bot volcano-sh-bot added kind/enhancement New feature or request kind/bug labels May 29, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the TokenRateLimiter to use a limiterKey (combining namespace and route) instead of a model name, allowing for better rate-limiting isolation. Unit tests have been updated, and a new isolation test has been added. The review feedback highlights two key issues: a potential nil pointer dereference panic in AddOrUpdateLimiter when the ratelimit configuration is nil, and a multi-tenancy limitation where a single shared Redis client is reused across different configurations, potentially ignoring unique Redis settings.

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go
Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the rate limiter tests and internals to key limiters by a composite identifier (e.g., namespace/modelRouteName) to improve isolation between routes, and adds a Redis-backed isolation test to validate distinct global limiter keys.

Changes:

  • Switched tests from a single model key to a composite limiterKey built from namespace + route name.
  • Updated AddOrUpdateLimiter / DeleteLimiter internals to use the provided limiter key when creating/storing limiters.
  • Added a global (Redis) isolation test to ensure different routes do not share the same Redis limiter key.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pkg/kthena-router/filters/ratelimit/ratelimit_test.go Refactors unit tests to use a composite limiter key and adds a helper to build it.
pkg/kthena-router/filters/ratelimit/ratelimit.go Renames parameters and uses the composite limiter key for map entries and global limiter construction.
pkg/kthena-router/filters/ratelimit/global_test.go Adds a Redis-backed isolation test verifying separate keys per route.
Comments suppressed due to low confidence (1)

pkg/kthena-router/filters/ratelimit/ratelimit.go:1

  • The receiver API is now mixed between model (e.g., RecordOutputTokens(model string, ...)) and limiterKey (e.g., AddOrUpdateLimiter / DeleteLimiter). Since callers are now passing composite keys, rename the remaining model parameters in this type’s public methods to limiterKey for consistency and to reduce confusion about what the string represents.
/*

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go
@@ -137,7 +137,7 @@ func (r *TokenRateLimiter) RecordOutputTokens(model string, tokenCount int) {
}

// AddOrUpdateLimiter adds or updates rate limiter for a model
@@ -204,12 +204,12 @@ func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkin
}

// DeleteLimiter deletes rate limiter for a model
Comment on lines +222 to +223
keyOne := "namespace-a/modelroute-1"
keyTwo := "namespace-a/modelroute-2"
Comment on lines +248 to +249
redisKeyOne := "kthena:ratelimit:namespace-a/modelroute-1:input"
redisKeyTwo := "kthena:ratelimit:namespace-a/modelroute-2:input"
@nXtCyberNet

Copy link
Copy Markdown
Contributor Author

/retest

@volcano-sh-bot

Copy link
Copy Markdown
Contributor

@nXtCyberNet: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


// AddOrUpdateLimiter adds or updates rate limiter for a model
func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don’t really understand what ‘limiterKey’ refers to. Changing the name of this formal parameter isn’t a good idea.

@nXtCyberNet nXtCyberNet Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LiZhenCheng9527 ,
So currently the ratelimit was defined by only the model which cause there is no namespace level isolation that cause the follow ratelimit bug :

ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min) 
  → Redis key: "llama-70b"

ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min)
  → Redis key: "llama-70b"  ← SAME KEY ❌

Result: ModelRoute-2's update OVERWRITES ModelRoute-1's token bucket
        Both namespaces now share the SAME token bucket with 50 tokens/min
        namespace-a requests get rate-limited incorrectly ❌

Which is inconsistent and breaks ratelimit if multiple teams were using the same model with different configs ,

So after changing the model name key to limiterkey - namespace/routename , since the routename is crd , so it's itself a unique name , so this will also allow per route rate limit isolation too
Like this

ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min)
  → Redis key: "namespace-a/modelroute-1"
  → Independent token bucket with 100 tokens/min

ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min)
  → Redis key: "namespace-b/modelroute-2"
  → Independent token bucket with 50 tokens/min ✅

Result: Each ModelRoute has its own isolated token bucket
namespace-a can use 100 tokens/min, namespace-b uses 50 tokens/min

,

Also it act same even if multiple routes were created in a same namespace

And the limitkey is a code level refactoring so also there is no userside level change , WDYT?

@hzxuzhonghu hzxuzhonghu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not appear to actually fix #1043. The diff only renames the model parameter to limiterKey in ratelimit.go and updates tests; the real call sites in router.go are unchanged, so limiters are still keyed by model name at runtime.

Blocking issues in pkg/kthena-router/router/router.go (not in this diff):

  • L112 AddOrUpdateLimiter(data.ModelName, ...) and L118 DeleteLimiter(data.ModelName) still register/delete by bare model name, so two ModelRoutes sharing a model still collide. These should use a namespace/routeName key derived from data.ModelRoute.
  • L278 RateLimit(modelName, promptStr) enforces using the model name from the request body, and it runs (step 2) before route resolution via MatchModelServer (L352, step 3). At that point only the model name is available, so there is no unique route key to use. Since the bug is specifically about multiple routes mapping to one model, enforcement needs the route resolved first (or moved after matching).
  • L814 and L1116 RecordOutputTokens(modelName, ...) keep output accounting shared across routes and must use the same key.

Until the router.go callbacks and request flow are updated, the isolation fix is not effective in production.

Positive: the ratelimit.go refactor threads the key consistently, and Redis key derivation already isolates correctly once a unique key is passed — the limiter layer is ready, the callers are not.

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go
Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go
Comment thread pkg/kthena-router/filters/ratelimit/global_test.go Outdated
Copilot AI review requested due to automatic review settings June 3, 2026 13:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Comment on lines 103 to +107
store.RegisterCallback("ModelRoute", func(data datastore.EventData) {
routeKey := fmt.Sprintf("%s/%s",
data.ModelRoute.Namespace,
data.ModelRoute.Name,
)
Comment thread pkg/kthena-router/router/router.go Outdated
Comment on lines +287 to +291
_, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey)
if err != nil || modelRoute == nil {
c.AbortWithStatusJSON(http.StatusNotFound, "route not found")
return
}
Comment thread pkg/kthena-router/router/router.go Outdated
// Match ModelRoute to get the route key. Require ModelRoute match only.
_, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey)
if err != nil || modelRoute == nil {
c.AbortWithStatusJSON(http.StatusNotFound, "route not found")
Comment on lines +832 to +834
if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens)
}
Comment on lines +1134 to +1136
if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens)
}
Comment on lines 139 to +140
// AddOrUpdateLimiter adds or updates rate limiter for a model
func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {
Comment on lines 206 to +207
// DeleteLimiter deletes rate limiter for a model
func (r *TokenRateLimiter) DeleteLimiter(model string) {
func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) {
Comment thread pkg/kthena-router/router/router.go Outdated
}

// Store metrics recorder in context for use in other functions
// Store metrics recorder in context for use in other functions
@nXtCyberNet nXtCyberNet requested a review from Copilot June 3, 2026 16:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Comment on lines +104 to +107
routeKey := fmt.Sprintf("%s/%s",
data.ModelRoute.Namespace,
data.ModelRoute.Name,
)
Comment thread pkg/kthena-router/router/router.go Outdated
Comment on lines +287 to +293
rateLimitKey, err := r.store.GetRateLimitKey(modelName, c.Request, gatewayKey)

if err != nil || modelRoute == nil {
c.AbortWithStatusJSON(http.StatusNotFound, "route not found")
return
}
rateLimitKey := fmt.Sprintf("%s/%s", modelRoute.Namespace, modelRoute.Name)
Comment on lines +833 to +835
if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens)
}
Comment on lines +1135 to +1137
if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens)
}
Comment on lines 139 to +140
// AddOrUpdateLimiter adds or updates rate limiter for a model
func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {
Comment on lines 206 to +207
// DeleteLimiter deletes rate limiter for a model
func (r *TokenRateLimiter) DeleteLimiter(model string) {
func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) {
Comment thread pkg/kthena-router/router/router.go Outdated
}

// Store metrics recorder in context for use in other functions
// Store metrics recorder in context for use in other functions
@nXtCyberNet nXtCyberNet requested a review from hzxuzhonghu June 5, 2026 15:34
@nXtCyberNet

nXtCyberNet commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Hi @hzxuzhonghu
So , when I was changing the resolution flow based on your comment , I come to a close end , where the ratelimiter requires the modelroute identity that I can get only after the matchmodelserver that was happening inside the doLoadbalance().

But this breaks the fairness scheduling - the rate limited request would be enqueued , prioritised , and scheduled before rejected , increase latency and queue ,
So I have a proposal for moving the matchmodelserver early instead of its in the doLoadbalance() , we can move it to the handlefunc

So the request will became like this -

  1. Call matchmodelserver in starting and then cache the mached modelroute , modelservername , islors and any matching error in the gin context , so we can use it in the doLoadbalance, without calling matchmodelserver again that maybe cause drift in answer when called twice in same flow ,
  2. If modelroute matches it check the ratelimit immediately if rate- limited return 429 as the normal workflow ,
    And for the case of httproute/inferencepool request , since they both didnot requires the ratelimiter , they can be skipped or allowed immediately as no config found for them ,

Also this will also allow us to add ratelimit in the inferencepool/httproute which is currently didnot exist , however I want to know why we didnot needed then here ,

Let me know what you think about this if the design make sense please review the pr , if not no issues I will just close the pr , thanks

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
.
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
@nXtCyberNet nXtCyberNet force-pushed the issue/ratelimit_isolation branch from 2ce55ed to 3613a52 Compare June 5, 2026 16:42
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
@nXtCyberNet

Copy link
Copy Markdown
Contributor Author

@hzxuzhonghu any updates?

.
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-Tenant Rate Limit Isolation Bug

5 participants