Skip to content

remove first request in TestModelRouteWithRateLimit#1072

Merged
volcano-sh-bot merged 1 commit into
volcano-sh:mainfrom
FAUST-BENCHOU:test/flaky-tokens
May 18, 2026
Merged

remove first request in TestModelRouteWithRateLimit#1072
volcano-sh-bot merged 1 commit into
volcano-sh:mainfrom
FAUST-BENCHOU:test/flaky-tokens

Conversation

@FAUST-BENCHOU
Copy link
Copy Markdown
Contributor

@FAUST-BENCHOU FAUST-BENCHOU commented May 15, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #1070

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Signed-off-by: zhoujinyu <2319109590@qq.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the rate-limiting E2E tests by removing initial reconciliation requests and consolidating quota calculation logic. However, the review highlights that removing these reconciliation steps introduces potential race conditions and flakiness, as the router may not be ready when the test requests begin. Feedback suggests re-introducing a readiness check, improving error diagnostics by reading response bodies, and using stricter assertions for better test reliability.

Comment thread test/e2e/router/shared.go
Comment on lines +691 to 693
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Removing the initial CheckChatCompletions call introduces a race condition. CheckChatCompletions includes retry logic that waits for the router to reconcile and the new ModelRoute to become active. By replacing it with a direct loop of SendChatRequest (which does not retry), the test will likely fail with a 404 or 503 error if the router hasn't finished updating its configuration when the first request is sent. It is recommended to use utils.WaitForChatModelReady before the loop to ensure the route is active and the model is discoverable.

Comment thread test/e2e/router/shared.go
Comment on lines +744 to 748
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop is also susceptible to flakiness due to the removal of the reconciliation request. Additionally, unlike the loop in the first test case (lines 694-699), this loop does not read the response body. If the assertion fails, the logs won't contain the error message from the router (e.g., 'route not found'), making it difficult to diagnose failures. Consider reading the body for better error reporting and using require.Equal instead of assert.Equal to stop the test immediately upon the first failure in the loop.

Comment thread test/e2e/router/shared.go
Comment on lines +807 to 811
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
resp.Body.Close()
assert.Equal(t, http.StatusOK, resp.StatusCode,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop suffers from the same flakiness and lack of error detail as the previous ones. It is recommended to ensure the model is ready before starting the quota exhaustion loop and to include the response body in the assertion message for easier debugging.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the router E2E rate-limit tests to stop issuing an initial “warm-up” chat request (previously used to handle reconciliation) and instead compute the quota consumption purely as inputTokenLimit / tokensPerRequest.

Changes:

  • Removed the initial CheckChatCompletions warm-up request from the input-token rate-limit tests (Test 1–3).
  • Simplified quota calculation to quotaRequests := inputTokenLimit / tokensPerRequest and adjusted logging accordingly.
Comments suppressed due to low confidence (2)

test/e2e/router/shared.go:748

  • Same concern as Test 1: the quota loop begins without any readiness gating, so initial requests can fail with 404/5xx before the router has reconciled the new ModelRoute, making the window-accuracy assertions flaky. Add a precondition step that waits for a successful 200 response before exhausting the quota.
		quotaRequests := inputTokenLimit / tokensPerRequest
		for i := 0; i < quotaRequests; i++ {
			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
			resp.Body.Close()
			assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1)

test/e2e/router/shared.go:812

  • This quota-consumption loop also lacks a warm-up/readiness check; if reconciliation isn't complete yet, the early requests may fail and the subsequent rate-limit reset assertions become unreliable. Recommend waiting for the route to return 200 before starting quota accounting.
		quotaRequests := inputTokenLimit / tokensPerRequest
		for i := 0; i < quotaRequests; i++ {
			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
			resp.Body.Close()
			assert.Equal(t, http.StatusOK, resp.StatusCode,
				"Request %d should succeed", i+1)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/e2e/router/shared.go
Comment on lines +691 to 695
quotaRequests := inputTokenLimit / tokensPerRequest
for i := 0; i < quotaRequests; i++ {
resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
responseBody, readErr := io.ReadAll(resp.Body)
resp.Body.Close()
@FAUST-BENCHOU FAUST-BENCHOU marked this pull request as ready for review May 15, 2026 14:58
Comment thread test/e2e/router/shared.go
Comment on lines -691 to -703
// First request: use CheckChatCompletions to handle router reconciliation
resp := utils.CheckChatCompletions(t, createdModelRoute.Spec.ModelName, standardMessage)
// CheckChatCompletions retries on transient failures (e.g., 404 before route is ready),
// but only the final successful request counts toward rate limit quota.
tokensConsumed := tokensPerRequest
t.Logf("Router reconciliation complete (consumed %d tokens, %d attempts including retries)", tokensConsumed, resp.Attempts)

// Calculate remaining quota
remainingQuota := inputTokenLimit - tokensConsumed
expectedSuccessfulRequests := remainingQuota / tokensPerRequest

// Send remaining requests until quota exhausted
for i := 0; i < expectedSuccessfulRequests; i++ {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @katara-Jayprakash i dont remember why need this part.I think First request will consume unexpected token.For example if First request return 404 instead of 200 it will try again but wont be recorded to token consumption.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FAUST-BENCHOU this was introduce by @xrwang8 in his recent pr! #989

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can directly remove the first request in fact.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot merged commit 45f4347 into volcano-sh:main May 18, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky e2e test TestModelRouteWithRateLimit/VerifyInputTokenRateLimitEnforcement

5 participants