remove first request in TestModelRouteWithRateLimit#1072
Conversation
Signed-off-by: zhoujinyu <2319109590@qq.com>
There was a problem hiding this comment.
Code Review
This pull request simplifies the rate-limiting E2E tests by removing initial reconciliation requests and consolidating quota calculation logic. However, the review highlights that removing these reconciliation steps introduces potential race conditions and flakiness, as the router may not be ready when the test requests begin. Feedback suggests re-introducing a readiness check, improving error diagnostics by reading response bodies, and using stricter assertions for better test reliability.
| quotaRequests := inputTokenLimit / tokensPerRequest | ||
| for i := 0; i < quotaRequests; i++ { | ||
| resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage) |
There was a problem hiding this comment.
Removing the initial CheckChatCompletions call introduces a race condition. CheckChatCompletions includes retry logic that waits for the router to reconcile and the new ModelRoute to become active. By replacing it with a direct loop of SendChatRequest (which does not retry), the test will likely fail with a 404 or 503 error if the router hasn't finished updating its configuration when the first request is sent. It is recommended to use utils.WaitForChatModelReady before the loop to ensure the route is active and the model is discoverable.
| quotaRequests := inputTokenLimit / tokensPerRequest | ||
| for i := 0; i < quotaRequests; i++ { | ||
| resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage) | ||
| resp.Body.Close() | ||
| assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1) |
There was a problem hiding this comment.
This loop is also susceptible to flakiness due to the removal of the reconciliation request. Additionally, unlike the loop in the first test case (lines 694-699), this loop does not read the response body. If the assertion fails, the logs won't contain the error message from the router (e.g., 'route not found'), making it difficult to diagnose failures. Consider reading the body for better error reporting and using require.Equal instead of assert.Equal to stop the test immediately upon the first failure in the loop.
| quotaRequests := inputTokenLimit / tokensPerRequest | ||
| for i := 0; i < quotaRequests; i++ { | ||
| resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage) | ||
| resp.Body.Close() | ||
| assert.Equal(t, http.StatusOK, resp.StatusCode, |
There was a problem hiding this comment.
Pull request overview
This PR updates the router E2E rate-limit tests to stop issuing an initial “warm-up” chat request (previously used to handle reconciliation) and instead compute the quota consumption purely as inputTokenLimit / tokensPerRequest.
Changes:
- Removed the initial
CheckChatCompletionswarm-up request from the input-token rate-limit tests (Test 1–3). - Simplified quota calculation to
quotaRequests := inputTokenLimit / tokensPerRequestand adjusted logging accordingly.
Comments suppressed due to low confidence (2)
test/e2e/router/shared.go:748
- Same concern as Test 1: the quota loop begins without any readiness gating, so initial requests can fail with 404/5xx before the router has reconciled the new ModelRoute, making the window-accuracy assertions flaky. Add a precondition step that waits for a successful 200 response before exhausting the quota.
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1)
test/e2e/router/shared.go:812
- This quota-consumption loop also lacks a warm-up/readiness check; if reconciliation isn't complete yet, the early requests may fail and the subsequent rate-limit reset assertions become unreliable. Recommend waiting for the route to return 200 before starting quota accounting.
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode,
"Request %d should succeed", i+1)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| quotaRequests := inputTokenLimit / tokensPerRequest | ||
| for i := 0; i < quotaRequests; i++ { | ||
| resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage) | ||
| responseBody, readErr := io.ReadAll(resp.Body) | ||
| resp.Body.Close() |
| // First request: use CheckChatCompletions to handle router reconciliation | ||
| resp := utils.CheckChatCompletions(t, createdModelRoute.Spec.ModelName, standardMessage) | ||
| // CheckChatCompletions retries on transient failures (e.g., 404 before route is ready), | ||
| // but only the final successful request counts toward rate limit quota. | ||
| tokensConsumed := tokensPerRequest | ||
| t.Logf("Router reconciliation complete (consumed %d tokens, %d attempts including retries)", tokensConsumed, resp.Attempts) | ||
|
|
||
| // Calculate remaining quota | ||
| remainingQuota := inputTokenLimit - tokensConsumed | ||
| expectedSuccessfulRequests := remainingQuota / tokensPerRequest | ||
|
|
||
| // Send remaining requests until quota exhausted | ||
| for i := 0; i < expectedSuccessfulRequests; i++ { |
There was a problem hiding this comment.
/cc @katara-Jayprakash i dont remember why need this part.I think First request will consume unexpected token.For example if First request return 404 instead of 200 it will try again but wont be recorded to token consumption.
There was a problem hiding this comment.
@FAUST-BENCHOU this was introduce by @xrwang8 in his recent pr! #989
There was a problem hiding this comment.
i think we can directly remove the first request in fact.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #1070
Special notes for your reviewer:
Does this PR introduce a user-facing change?: