-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Bug
The /v1/responses endpoint swallows upstream HTTP errors (including 429 rate limits) and returns HTTP 200 with status: "failed", empty content, and 0 usage tokens. It should propagate the upstream error as the corresponding HTTP status code.
Reproduction
# Chat completions correctly returns 429:
curl -s 'https://cloud-api.near.ai/v1/chat/completions' \
-H 'Authorization: Bearer <key>' \
-d '{"model":"google/gemini-3-pro","messages":[{"role":"user","content":"hi"}],"max_tokens":32}'
# → HTTP 429: {"error":{"message":"Rate limit exceeded..."}}
# Responses API swallows the 429:
curl -s 'https://cloud-api.near.ai/v1/responses' \
-H 'Authorization: Bearer <key>' \
-d '{"model":"google/gemini-3-pro","input":"hi","max_output_tokens":32}'
# → HTTP 200: {"status":"failed","output":[{"content":[{"text":""}]}],"usage":{"total_tokens":0}}Impact
- Clients cannot detect rate limits and retry appropriately
infra-teststest_responses[gemini-3-pro]persistently fails becauserequest_with_retry()sees HTTP 200 and doesn't retry, while the equivalent chat completion test retries on 429 and eventually succeeds
Root Cause
Traced through the code:
- Gemini backend (
inference_providers/src/external/gemini/mod.rs:216): ReturnsCompletionError::HttpError { status_code: 429 }— correct - Completion stream (
services/src/responses/service.rs:696-703): Catches the error, setsstream_error = true, breaks — no usage captured - Service (
services/src/responses/service.rs:1184-1232): Emitsresponse.failedevent but neverresponse.completed— no final response object - Route handler fallback (
api/src/routes/responses.rs:500-560): Nofinal_responsefrom completed event → falls through to fallback withUsage::new(0, 0)hardcoded → returns HTTP 200
Suggested Fix
When the completion stream errors with an HTTP error, propagate it as the corresponding HTTP status code from the /v1/responses endpoint, rather than wrapping it in status: "failed" with HTTP 200. At minimum, 429 rate limit errors should be propagated as HTTP 429 so clients can implement retry logic.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels