fix(ratelimit): isolate rate limiters per ModelRoute#1171
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request refactors the TokenRateLimiter to use a limiterKey (combining namespace and route) instead of a model name, allowing for better rate-limiting isolation. Unit tests have been updated, and a new isolation test has been added. The review feedback highlights two key issues: a potential nil pointer dereference panic in AddOrUpdateLimiter when the ratelimit configuration is nil, and a multi-tenancy limitation where a single shared Redis client is reused across different configurations, potentially ignoring unique Redis settings.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the rate limiter tests and internals to key limiters by a composite identifier (e.g., namespace/modelRouteName) to improve isolation between routes, and adds a Redis-backed isolation test to validate distinct global limiter keys.
Changes:
- Switched tests from a single
modelkey to a compositelimiterKeybuilt from namespace + route name. - Updated
AddOrUpdateLimiter/DeleteLimiterinternals to use the provided limiter key when creating/storing limiters. - Added a global (Redis) isolation test to ensure different routes do not share the same Redis limiter key.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/kthena-router/filters/ratelimit/ratelimit_test.go | Refactors unit tests to use a composite limiter key and adds a helper to build it. |
| pkg/kthena-router/filters/ratelimit/ratelimit.go | Renames parameters and uses the composite limiter key for map entries and global limiter construction. |
| pkg/kthena-router/filters/ratelimit/global_test.go | Adds a Redis-backed isolation test verifying separate keys per route. |
Comments suppressed due to low confidence (1)
pkg/kthena-router/filters/ratelimit/ratelimit.go:1
- The receiver API is now mixed between
model(e.g.,RecordOutputTokens(model string, ...)) andlimiterKey(e.g.,AddOrUpdateLimiter/DeleteLimiter). Since callers are now passing composite keys, rename the remainingmodelparameters in this type’s public methods tolimiterKeyfor consistency and to reduce confusion about what the string represents.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -137,7 +137,7 @@ func (r *TokenRateLimiter) RecordOutputTokens(model string, tokenCount int) { | |||
| } | |||
|
|
|||
| // AddOrUpdateLimiter adds or updates rate limiter for a model | |||
| @@ -204,12 +204,12 @@ func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkin | |||
| } | |||
|
|
|||
| // DeleteLimiter deletes rate limiter for a model | |||
| keyOne := "namespace-a/modelroute-1" | ||
| keyTwo := "namespace-a/modelroute-2" |
| redisKeyOne := "kthena:ratelimit:namespace-a/modelroute-1:input" | ||
| redisKeyTwo := "kthena:ratelimit:namespace-a/modelroute-2:input" |
|
/retest |
|
@nXtCyberNet: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
|
||
| // AddOrUpdateLimiter adds or updates rate limiter for a model | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error { | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error { |
There was a problem hiding this comment.
To be honest, I don’t really understand what ‘limiterKey’ refers to. Changing the name of this formal parameter isn’t a good idea.
There was a problem hiding this comment.
Hi @LiZhenCheng9527 ,
So currently the ratelimit was defined by only the model which cause there is no namespace level isolation that cause the follow ratelimit bug :
ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min)
→ Redis key: "llama-70b"
ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min)
→ Redis key: "llama-70b" ← SAME KEY ❌
Result: ModelRoute-2's update OVERWRITES ModelRoute-1's token bucket
Both namespaces now share the SAME token bucket with 50 tokens/min
namespace-a requests get rate-limited incorrectly ❌
Which is inconsistent and breaks ratelimit if multiple teams were using the same model with different configs ,
So after changing the model name key to limiterkey - namespace/routename , since the routename is crd , so it's itself a unique name , so this will also allow per route rate limit isolation too
Like this
ModelRoute-1 (namespace-a, model: llama-70b, tokenLimit: 100 tokens/min)
→ Redis key: "namespace-a/modelroute-1"
→ Independent token bucket with 100 tokens/min
ModelRoute-2 (namespace-b, model: llama-70b, tokenLimit: 50 tokens/min)
→ Redis key: "namespace-b/modelroute-2"
→ Independent token bucket with 50 tokens/min ✅
Result: Each ModelRoute has its own isolated token bucket
namespace-a can use 100 tokens/min, namespace-b uses 50 tokens/min
,
Also it act same even if multiple routes were created in a same namespace
And the limitkey is a code level refactoring so also there is no userside level change , WDYT?
hzxuzhonghu
left a comment
There was a problem hiding this comment.
This PR does not appear to actually fix #1043. The diff only renames the model parameter to limiterKey in ratelimit.go and updates tests; the real call sites in router.go are unchanged, so limiters are still keyed by model name at runtime.
Blocking issues in pkg/kthena-router/router/router.go (not in this diff):
- L112
AddOrUpdateLimiter(data.ModelName, ...)and L118DeleteLimiter(data.ModelName)still register/delete by bare model name, so two ModelRoutes sharing a model still collide. These should use anamespace/routeNamekey derived fromdata.ModelRoute. - L278
RateLimit(modelName, promptStr)enforces using the model name from the request body, and it runs (step 2) before route resolution viaMatchModelServer(L352, step 3). At that point only the model name is available, so there is no unique route key to use. Since the bug is specifically about multiple routes mapping to one model, enforcement needs the route resolved first (or moved after matching). - L814 and L1116
RecordOutputTokens(modelName, ...)keep output accounting shared across routes and must use the same key.
Until the router.go callbacks and request flow are updated, the isolation fix is not effective in production.
Positive: the ratelimit.go refactor threads the key consistently, and Redis key derivation already isolates correctly once a unique key is passed — the limiter layer is ready, the callers are not.
| store.RegisterCallback("ModelRoute", func(data datastore.EventData) { | ||
| routeKey := fmt.Sprintf("%s/%s", | ||
| data.ModelRoute.Namespace, | ||
| data.ModelRoute.Name, | ||
| ) |
| _, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey) | ||
| if err != nil || modelRoute == nil { | ||
| c.AbortWithStatusJSON(http.StatusNotFound, "route not found") | ||
| return | ||
| } |
| // Match ModelRoute to get the route key. Require ModelRoute match only. | ||
| _, _, modelRoute, _ := r.store.MatchModelServer(modelName, c.Request, gatewayKey) | ||
| if err != nil || modelRoute == nil { | ||
| c.AbortWithStatusJSON(http.StatusNotFound, "route not found") |
| if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok { | ||
| r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens) | ||
| } |
| if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok { | ||
| r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens) | ||
| } |
| // AddOrUpdateLimiter adds or updates rate limiter for a model | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error { | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error { |
| // DeleteLimiter deletes rate limiter for a model | ||
| func (r *TokenRateLimiter) DeleteLimiter(model string) { | ||
| func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) { |
| } | ||
|
|
||
| // Store metrics recorder in context for use in other functions | ||
| // Store metrics recorder in context for use in other functions |
| routeKey := fmt.Sprintf("%s/%s", | ||
| data.ModelRoute.Namespace, | ||
| data.ModelRoute.Name, | ||
| ) |
| rateLimitKey, err := r.store.GetRateLimitKey(modelName, c.Request, gatewayKey) | ||
|
|
||
| if err != nil || modelRoute == nil { | ||
| c.AbortWithStatusJSON(http.StatusNotFound, "route not found") | ||
| return | ||
| } | ||
| rateLimitKey := fmt.Sprintf("%s/%s", modelRoute.Namespace, modelRoute.Name) |
| if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok { | ||
| r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), resp.Usage.CompletionTokens) | ||
| } |
| if rateLimitKeyVal, ok := c.Get("rateLimitKey"); ok { | ||
| r.loadRateLimiter.RecordOutputTokens(rateLimitKeyVal.(string), outputTokens) | ||
| } |
| // AddOrUpdateLimiter adds or updates rate limiter for a model | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(model string, ratelimit *networkingv1alpha1.RateLimit) error { | ||
| func (r *TokenRateLimiter) AddOrUpdateLimiter(limiterKey string, ratelimit *networkingv1alpha1.RateLimit) error { |
| // DeleteLimiter deletes rate limiter for a model | ||
| func (r *TokenRateLimiter) DeleteLimiter(model string) { | ||
| func (r *TokenRateLimiter) DeleteLimiter(limiterKey string) { |
| } | ||
|
|
||
| // Store metrics recorder in context for use in other functions | ||
| // Store metrics recorder in context for use in other functions |
|
Hi @hzxuzhonghu But this breaks the fairness scheduling - the rate limited request would be enqueued , prioritised , and scheduled before rejected , increase latency and queue , So the request will became like this -
Also this will also allow us to add ratelimit in the inferencepool/httproute which is currently didnot exist , however I want to know why we didnot needed then here , Let me know what you think about this if the design make sense please review the pr , if not no issues I will just close the pr , thanks |
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
2ce55ed to
3613a52
Compare
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
|
@hzxuzhonghu any updates? |
/kind enhancement
/kind bug
What this PR does / Why we need it
Fixes a multi-tenant rate limit isolation issue where rate limiters were keyed only by model name. When multiple
ModelRoutes referenced the same model, they could unintentionally share the same limiter state and Redis token bucket, causing one route's configuration to affect another.Changes
namespace/routeNameinstead ofmodelName.Tests
Added unit and Redis-backed integration tests to verify:
Issue
Fixes #1043
User-Facing Changes
NONE