fix(ratelimit): isolate rate limiters per ModelRoute by nXtCyberNet · Pull Request #1171 · volcano-sh/kthena

nXtCyberNet · 2026-05-29T18:00:18Z

/kind enhancement
/kind bug

What this PR does / Why we need it

Fixes a multi-tenant rate limit isolation issue where rate limiters were keyed only by model name. When multiple ModelRoutes referenced the same model, they could unintentionally share the same limiter state and Redis token bucket, causing one route's configuration to affect another.

Changes

Scope rate limiters using namespace/routeName instead of modelName.
Update router callbacks to use namespace-scoped limiter keys.
Update limiter add, delete, and lookup paths to use the new key.
Redis keys are now automatically isolated per ModelRoute.

Tests

Added unit and Redis-backed integration tests to verify:

Independent limiter state per ModelRoute.
Isolated Redis token buckets.
No cross-route interference when consuming tokens.

Issue

Fixes #1043

User-Facing Changes

NONE

NONE

volcano-sh-bot · 2026-05-29T18:00:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist

Code Review

This pull request refactors the TokenRateLimiter to use a limiterKey (combining namespace and route) instead of a model name, allowing for better rate-limiting isolation. Unit tests have been updated, and a new isolation test has been added. The review feedback highlights two key issues: a potential nil pointer dereference panic in AddOrUpdateLimiter when the ratelimit configuration is nil, and a multi-tenancy limitation where a single shared Redis client is reused across different configurations, potentially ignoring unique Redis settings.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the rate limiter tests and internals to key limiters by a composite identifier (e.g., namespace/modelRouteName) to improve isolation between routes, and adds a Redis-backed isolation test to validate distinct global limiter keys.

Changes:

Switched tests from a single model key to a composite limiterKey built from namespace + route name.
Updated AddOrUpdateLimiter / DeleteLimiter internals to use the provided limiter key when creating/storing limiters.
Added a global (Redis) isolation test to ensure different routes do not share the same Redis limiter key.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
pkg/kthena-router/filters/ratelimit/ratelimit_test.go	Refactors unit tests to use a composite limiter key and adds a helper to build it.
pkg/kthena-router/filters/ratelimit/ratelimit.go	Renames parameters and uses the composite limiter key for map entries and global limiter construction.
pkg/kthena-router/filters/ratelimit/global_test.go	Adds a Redis-backed isolation test verifying separate keys per route.

Comments suppressed due to low confidence (1)

pkg/kthena-router/filters/ratelimit/ratelimit.go:1

The receiver API is now mixed between model (e.g., RecordOutputTokens(model string, ...)) and limiterKey (e.g., AddOrUpdateLimiter / DeleteLimiter). Since callers are now passing composite keys, rename the remaining model parameters in this type’s public methods to limiterKey for consistency and to reduce confusion about what the string represents.

/*

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -137,7 +137,7 @@ func (r *TokenRateLimiter) RecordOutputTokens(model string, tokenCount int) {
 }

 // AddOrUpdateLimiter adds or updates rate limiter for a model


@@ -204,12 +204,12 @@ func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkin
 }

 // DeleteLimiter deletes rate limiter for a model


+	keyOne := "namespace-a/modelroute-1"
+	keyTwo := "namespace-a/modelroute-2"


+	redisKeyOne := "kthena:ratelimit:namespace-a/modelroute-1:input"
+	redisKeyTwo := "kthena:ratelimit:namespace-a/modelroute-2:input"


nXtCyberNet · 2026-05-29T18:26:53Z

/retest

volcano-sh-bot · 2026-05-29T18:27:16Z

@nXtCyberNet: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LiZhenCheng9527 · 2026-06-01T02:26:51Z


 // AddOrUpdateLimiter adds or updates rate limiter for a model
-func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
+func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {


To be honest, I don’t really understand what ‘limiterKey’ refers to. Changing the name of this formal parameter isn’t a good idea.

Hi @LiZhenCheng9527 ,
So currently the ratelimit was defined by only the model which cause there is no namespace level isolation that cause the follow ratelimit bug :

ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min) → Redis key: "llama-70b" ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min) → Redis key: "llama-70b" ← SAME KEY ❌ Result: ModelRoute-2's update OVERWRITES ModelRoute-1's token bucket Both namespaces now share the SAME token bucket with 50 tokens/min namespace-a requests get rate-limited incorrectly ❌

Which is inconsistent and breaks ratelimit if multiple teams were using the same model with different configs ,

So after changing the model name key to limiterkey - namespace/routename , since the routename is crd , so it's itself a unique name , so this will also allow per route rate limit isolation too
Like this

ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min) → Redis key: "namespace-a/modelroute-1" → Independent token bucket with 100 tokens/min ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min) → Redis key: "namespace-b/modelroute-2" → Independent token bucket with 50 tokens/min ✅

Result: Each ModelRoute has its own isolated token bucket
namespace-a can use 100 tokens/min, namespace-b uses 50 tokens/min

,

Also it act same even if multiple routes were created in a same namespace

And the limitkey is a code level refactoring so also there is no userside level change , WDYT?

hzxuzhonghu

This PR does not appear to actually fix #1043. The diff only renames the model parameter to limiterKey in ratelimit.go and updates tests; the real call sites in router.go are unchanged, so limiters are still keyed by model name at runtime.

Blocking issues in pkg/kthena-router/router/router.go (not in this diff):

L112 AddOrUpdateLimiter(data.ModelName, ...) and L118 DeleteLimiter(data.ModelName) still register/delete by bare model name, so two ModelRoutes sharing a model still collide. These should use a namespace/routeName key derived from data.ModelRoute.
L278 RateLimit(modelName, promptStr) enforces using the model name from the request body, and it runs (step 2) before route resolution via MatchModelServer (L352, step 3). At that point only the model name is available, so there is no unique route key to use. Since the bug is specifically about multiple routes mapping to one model, enforcement needs the route resolved first (or moved after matching).
L814 and L1116 RecordOutputTokens(modelName, ...) keep output accounting shared across routes and must use the same key.

Until the router.go callbacks and request flow are updated, the isolation fix is not effective in production.

Positive: the ratelimit.go refactor threads the key consistently, and Redis key derivation already isolates correctly once a unique key is passed — the limiter layer is ready, the callers are not.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

 	store.RegisterCallback("ModelRoute", func(data datastore.EventData) {
+		routeKey := fmt.Sprintf("%s/%s",
+			data.ModelRoute.Namespace,
+			data.ModelRoute.Name,
+		)


+		_, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey)
+		if err != nil || modelRoute == nil {
+			c.AbortWithStatusJSON(http.StatusNotFound, "route not found")
+			return
+		}


+		// Match ModelRoute to get the route key. Require ModelRoute match only.
+		_, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey)
+		if err != nil || modelRoute == nil {
+			c.AbortWithStatusJSON(http.StatusNotFound, "route not found")


+				if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
+					r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens)
+				}


+			if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
+				r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens)
+			}


 // AddOrUpdateLimiter adds or updates rate limiter for a model
-func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
+func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {


 // DeleteLimiter deletes rate limiter for a model
-func (r *TokenRateLimiter) DeleteLimiter(model string) {
+func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) {


 		}

-		// Store metrics recorder in context for use in other functions
+		// Store metrics recorder  in context for use in other functions


Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

+		routeKey := fmt.Sprintf("%s/%s",
+			data.ModelRoute.Namespace,
+			data.ModelRoute.Name,
+		)


+		rateLimitKey, err := r.store.GetRateLimitKey(modelName, c.Request, gatewayKey)
+
+		if err != nil || modelRoute == nil {
+			c.AbortWithStatusJSON(http.StatusNotFound, "route not found")
+			return
+		}
+		rateLimitKey := fmt.Sprintf("%s/%s", modelRoute.Namespace, modelRoute.Name)


+				if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
+					r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens)
+				}


+			if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok {
+				r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens)
+			}


 // AddOrUpdateLimiter adds or updates rate limiter for a model
-func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error {
+func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error {


 // DeleteLimiter deletes rate limiter for a model
-func (r *TokenRateLimiter) DeleteLimiter(model string) {
+func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) {


 		}

-		// Store metrics recorder in context for use in other functions
+		// Store metrics recorder  in context for use in other functions


nXtCyberNet · 2026-06-05T15:44:31Z

Hi @hzxuzhonghu
So , when I was changing the resolution flow based on your comment , I come to a close end , where the ratelimiter requires the modelroute identity that I can get only after the matchmodelserver that was happening inside the doLoadbalance().

But this breaks the fairness scheduling - the rate limited request would be enqueued , prioritised , and scheduled before rejected , increase latency and queue ,
So I have a proposal for moving the matchmodelserver early instead of its in the doLoadbalance() , we can move it to the handlefunc

So the request will became like this -

Call matchmodelserver in starting and then cache the mached modelroute , modelservername , islors and any matching error in the gin context , so we can use it in the doLoadbalance, without calling matchmodelserver again that maybe cause drift in answer when called twice in same flow ,
If modelroute matches it check the ratelimit immediately if rate- limited return 429 as the normal workflow ,
And for the case of httproute/inferencepool request , since they both didnot requires the ratelimiter , they can be skipped or allowed immediately as no config found for them ,

Also this will also allow us to add ratelimit in the inferencepool/httproute which is currently didnot exist , however I want to know why we didnot needed then here ,

Let me know what you think about this if the design make sense please review the pr , if not no issues I will just close the pr , thanks

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

nXtCyberNet · 2026-06-12T06:00:27Z

@hzxuzhonghu any updates?

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

Copilot AI review requested due to automatic review settings May 29, 2026 18:00

volcano-sh-bot added kind/enhancement New feature or request kind/bug labels May 29, 2026

volcano-sh-bot requested review from LiZhenCheng9527 and hzxuzhonghu May 29, 2026 18:00

volcano-sh-bot added the size/L label May 29, 2026

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go

Copilot AI reviewed May 29, 2026

View reviewed changes

LiZhenCheng9527 reviewed Jun 1, 2026

View reviewed changes

nXtCyberNet requested a review from LiZhenCheng9527 June 1, 2026 05:10

hzxuzhonghu requested changes Jun 1, 2026

View reviewed changes

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go

Comment thread pkg/kthena-router/filters/ratelimit/ratelimit.go

Comment thread pkg/kthena-router/filters/ratelimit/global_test.go Outdated

Copilot AI review requested due to automatic review settings June 3, 2026 13:13

Copilot AI reviewed Jun 3, 2026

View reviewed changes

nXtCyberNet requested a review from Copilot June 3, 2026 16:59

Copilot AI reviewed Jun 3, 2026

View reviewed changes

nXtCyberNet requested a review from hzxuzhonghu June 5, 2026 15:34

nXtCyberNet added 7 commits June 5, 2026 22:12

updated the route namestruct from model to namespace/modelroutename

399d0bd

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

changed the key in main flow

199be10

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

added err log

e1aad99

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

solved the issue

03c06ed

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

.

98e8ca5

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

changed ratelimit to doloadbalancer

263ab88

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

ssolved ci

3613a52

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

nXtCyberNet force-pushed the issue/ratelimit_isolation branch from 2ce55ed to 3613a52 Compare June 5, 2026 16:42

nXtCyberNet added 2 commits June 6, 2026 11:05

resolve conflict

f2ba014

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

remove the test

f86bb4d

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

This was referenced Jun 15, 2026

Bug: Token rate limiter uses inaccurate heuristic tokenizer for input token accounting #1161

Open

proposal for the tokenizer #1225

Open

.

3d7df8d

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>

		@@ -137,7 +137,7 @@ func (r *TokenRateLimiter) RecordOutputTokens(model string, tokenCount int) {
		}

		// AddOrUpdateLimiter adds or updates rate limiter for a model

		@@ -204,12 +204,12 @@ func (r TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit networkin
		}

		// DeleteLimiter deletes rate limiter for a model

		keyOne := "namespace-a/modelroute-1"
		keyTwo := "namespace-a/modelroute-2"

		redisKeyOne := "kthena:ratelimit:namespace-a/modelroute-1:input"
		redisKeyTwo := "kthena:ratelimit:namespace-a/modelroute-2:input"

Conversation

nXtCyberNet commented May 29, 2026

What this PR does / Why we need it

Changes

Tests

Issue

User-Facing Changes

Uh oh!

volcano-sh-bot commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

nXtCyberNet commented May 29, 2026

Uh oh!

volcano-sh-bot commented May 29, 2026

Uh oh!

LiZhenCheng9527 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

nXtCyberNet Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

nXtCyberNet commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nXtCyberNet commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nXtCyberNet Jun 1, 2026 •

edited

Loading

nXtCyberNet commented Jun 5, 2026 •

edited

Loading