remove first request in TestModelRouteWithRateLimit by FAUST-BENCHOU · Pull Request #1072 · volcano-sh/kthena

FAUST-BENCHOU · 2026-05-15T14:20:24Z

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #1070

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Signed-off-by: zhoujinyu <2319109590@qq.com>

gemini-code-assist

Code Review

This pull request simplifies the rate-limiting E2E tests by removing initial reconciliation requests and consolidating quota calculation logic. However, the review highlights that removing these reconciliation steps introduces potential race conditions and flakiness, as the router may not be ready when the test requests begin. Feedback suggests re-introducing a readiness check, improving error diagnostics by reading response bodies, and using stricter assertions for better test reliability.

gemini-code-assist · 2026-05-15T14:21:51Z

+		quotaRequests := inputTokenLimit / tokensPerRequest
+		for i := 0; i < quotaRequests; i++ {
 			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)


Removing the initial CheckChatCompletions call introduces a race condition. CheckChatCompletions includes retry logic that waits for the router to reconcile and the new ModelRoute to become active. By replacing it with a direct loop of SendChatRequest (which does not retry), the test will likely fail with a 404 or 503 error if the router hasn't finished updating its configuration when the first request is sent. It is recommended to use utils.WaitForChatModelReady before the loop to ensure the route is active and the model is discoverable.

gemini-code-assist · 2026-05-15T14:21:52Z

+		quotaRequests := inputTokenLimit / tokensPerRequest
+		for i := 0; i < quotaRequests; i++ {
 			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
 			resp.Body.Close()
 			assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1)


This loop is also susceptible to flakiness due to the removal of the reconciliation request. Additionally, unlike the loop in the first test case (lines 694-699), this loop does not read the response body. If the assertion fails, the logs won't contain the error message from the router (e.g., 'route not found'), making it difficult to diagnose failures. Consider reading the body for better error reporting and using require.Equal instead of assert.Equal to stop the test immediately upon the first failure in the loop.

gemini-code-assist · 2026-05-15T14:21:52Z

+		quotaRequests := inputTokenLimit / tokensPerRequest
+		for i := 0; i < quotaRequests; i++ {
 			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
 			resp.Body.Close()
 			assert.Equal(t, http.StatusOK, resp.StatusCode,


This loop suffers from the same flakiness and lack of error detail as the previous ones. It is recommended to ensure the model is ready before starting the quota exhaustion loop and to include the response body in the assertion message for easier debugging.

Copilot

Pull request overview

This PR updates the router E2E rate-limit tests to stop issuing an initial “warm-up” chat request (previously used to handle reconciliation) and instead compute the quota consumption purely as inputTokenLimit / tokensPerRequest.

Changes:

Removed the initial CheckChatCompletions warm-up request from the input-token rate-limit tests (Test 1–3).
Simplified quota calculation to quotaRequests := inputTokenLimit / tokensPerRequest and adjusted logging accordingly.

Comments suppressed due to low confidence (2)

test/e2e/router/shared.go:748

Same concern as Test 1: the quota loop begins without any readiness gating, so initial requests can fail with 404/5xx before the router has reconciled the new ModelRoute, making the window-accuracy assertions flaky. Add a precondition step that waits for a successful 200 response before exhausting the quota.

		quotaRequests := inputTokenLimit / tokensPerRequest
		for i := 0; i < quotaRequests; i++ {
			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
			resp.Body.Close()
			assert.Equal(t, http.StatusOK, resp.StatusCode, "Request %d should succeed", i+1)

test/e2e/router/shared.go:812

This quota-consumption loop also lacks a warm-up/readiness check; if reconciliation isn't complete yet, the early requests may fail and the subsequent rate-limit reset assertions become unreliable. Recommend waiting for the route to return 200 before starting quota accounting.

		quotaRequests := inputTokenLimit / tokensPerRequest
		for i := 0; i < quotaRequests; i++ {
			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
			resp.Body.Close()
			assert.Equal(t, http.StatusOK, resp.StatusCode,
				"Request %d should succeed", i+1)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+		quotaRequests := inputTokenLimit / tokensPerRequest
+		for i := 0; i < quotaRequests; i++ {
 			resp := utils.SendChatRequest(t, createdModelRoute.Spec.ModelName, standardMessage)
 			responseBody, readErr := io.ReadAll(resp.Body)
 			resp.Body.Close()


FAUST-BENCHOU · 2026-05-15T15:04:45Z

-		// First request: use CheckChatCompletions to handle router reconciliation
-		resp := utils.CheckChatCompletions(t, createdModelRoute.Spec.ModelName, standardMessage)
-		// CheckChatCompletions retries on transient failures (e.g., 404 before route is ready),
-		// but only the final successful request counts toward rate limit quota.
-		tokensConsumed := tokensPerRequest
-		t.Logf("Router reconciliation complete (consumed %d tokens, %d attempts including retries)", tokensConsumed, resp.Attempts)
-
-		// Calculate remaining quota
-		remainingQuota := inputTokenLimit - tokensConsumed
-		expectedSuccessfulRequests := remainingQuota / tokensPerRequest
-
-		// Send remaining requests until quota exhausted
-		for i := 0; i < expectedSuccessfulRequests; i++ {


/cc @katara-Jayprakash i dont remember why need this part.I think First request will consume unexpected token.For example if First request return 404 instead of 200 it will try again but wont be recorded to token consumption.

@FAUST-BENCHOU this was introduce by @xrwang8 in his recent pr! #989

i think we can directly remove the first request in fact.

hzxuzhonghu

/lgtm

volcano-sh-bot · 2026-05-18T03:15:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/OWNERS~~ [hzxuzhonghu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

remove first request in TestModelRouteWithRateLimit

934b2eb

Signed-off-by: zhoujinyu <2319109590@qq.com>

Copilot AI review requested due to automatic review settings May 15, 2026 14:20

volcano-sh-bot added the do-not-merge/work-in-progress label May 15, 2026

volcano-sh-bot requested review from YaoZengzeng and hzxuzhonghu May 15, 2026 14:20

volcano-sh-bot added the size/M label May 15, 2026

Copilot started reviewing on behalf of FAUST-BENCHOU May 15, 2026 14:21 View session

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Copilot AI reviewed May 15, 2026

View reviewed changes

FAUST-BENCHOU marked this pull request as ready for review May 15, 2026 14:58

volcano-sh-bot removed the do-not-merge/work-in-progress label May 15, 2026

FAUST-BENCHOU commented May 15, 2026

View reviewed changes

hzxuzhonghu approved these changes May 18, 2026

View reviewed changes

volcano-sh-bot assigned hzxuzhonghu May 18, 2026

volcano-sh-bot added the lgtm label May 18, 2026

volcano-sh-bot added the approved label May 18, 2026

volcano-sh-bot merged commit 45f4347 into volcano-sh:main May 18, 2026
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove first request in TestModelRouteWithRateLimit#1072

remove first request in TestModelRouteWithRateLimit#1072
volcano-sh-bot merged 1 commit into
volcano-sh:mainfrom
FAUST-BENCHOU:test/flaky-tokens

FAUST-BENCHOU commented May 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

FAUST-BENCHOU May 15, 2026

Uh oh!

katara-Jayprakash May 15, 2026

Uh oh!

FAUST-BENCHOU May 16, 2026

Uh oh!

hzxuzhonghu left a comment

Uh oh!

volcano-sh-bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

FAUST-BENCHOU commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

FAUST-BENCHOU May 15, 2026

Choose a reason for hiding this comment

Uh oh!

katara-Jayprakash May 15, 2026

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU May 16, 2026

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

volcano-sh-bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FAUST-BENCHOU commented May 15, 2026 •

edited

Loading