Skip to content

fix(gateway): return 503 for orchestrator rejections#1340

Open
Bhanudahiyaa wants to merge 1 commit intomofa-org:mainfrom
Bhanudahiyaa:fix/openai-rejection-http-semantics
Open

fix(gateway): return 503 for orchestrator rejections#1340
Bhanudahiyaa wants to merge 1 commit intomofa-org:mainfrom
Bhanudahiyaa:fix/openai-rejection-http-semantics

Conversation

@Bhanudahiyaa
Copy link
Contributor

@Bhanudahiyaa Bhanudahiyaa commented Mar 17, 2026

Summary

Return 503 Service Unavailable with a structured GatewayErrorBody when orchestrator routing returns RoutedBackend::Rejected, instead of returning a 200 completion payload.

Motivation

The OpenAI-compatible contract should distinguish successful inference from backend admission failure. Returning 200 on rejection causes client misclassification and unreliable
recovery behavior.

fixes: #1339

Changes

  • Updated rejection handling in chat_completions for:
    • non-streaming path
    • streaming path
  • Added build_rejected_response(...) helper to centralize error response construction.
  • Preserved MoFA diagnostic headers (x-mofa-backend, x-mofa-latency-ms) on rejection responses.
  • Added two regression tests:
    • test_non_streaming_rejected_request_returns_503
    • test_streaming_rejected_request_returns_503

Design / Tradeoffs

  • Chose 503 because rejection here is capacity/service unavailability, not request-shape invalidity.
  • Kept error body compatible with existing gateway error envelope for client consistency.

Testing

Executed:

  • Focused change (single issue, single PR scope)
  • Behavior fixed for both stream + non-stream
  • Regression tests added
  • cargo fmt --check (workspace-wide; blocked by unrelated files)
  • cargo clippy --workspace --all-features -- -D errors (pre-existing unrelated warnings/errors)
  • Architecture layer boundaries respected

———

@Bhanudahiyaa Bhanudahiyaa force-pushed the fix/openai-rejection-http-semantics branch from 0050d5b to ac7dd98 Compare March 17, 2026 20:58
@Bhanudahiyaa
Copy link
Contributor Author

@lijingrs @yangrudan @BH3GEI

This PR corrects the gateway’s handling of orchestrator rejections by returning a 503 Service Unavailable instead of a 200 completion response.

Key updates

  • Applied consistent rejection handling for both streaming and non-streaming paths
  • Introduced a centralized helper for structured error responses
  • Preserved diagnostic headers (x-mofa-backend, x-mofa-latency-ms)
  • Added regression tests to enforce correct HTTP semantics

Rationale

Rejections represent capacity/admission failures, not successful inference. Returning 200 breaks client retry logic and observability.

Would appreciate confirmation that 503 is the appropriate status for this case, or if a different mapping is preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI compatible gateway returns HTTP 200 for orchestrator rejections

1 participant