fix(gateway): return 503 for orchestrator rejections by Bhanudahiyaa · Pull Request #1340 · mofa-org/mofa

Bhanudahiyaa · 2026-03-17T20:42:53Z

Summary

Return 503 Service Unavailable with a structured GatewayErrorBody when orchestrator routing returns RoutedBackend::Rejected, instead of returning a 200 completion payload.

Motivation

The OpenAI-compatible contract should distinguish successful inference from backend admission failure. Returning 200 on rejection causes client misclassification and unreliable
recovery behavior.

fixes: #1339

Changes

Updated rejection handling in chat_completions for:
- non-streaming path
- streaming path
Added build_rejected_response(...) helper to centralize error response construction.
Preserved MoFA diagnostic headers (x-mofa-backend, x-mofa-latency-ms) on rejection responses.
Added two regression tests:
- test_non_streaming_rejected_request_returns_503
- test_streaming_rejected_request_returns_503

Design / Tradeoffs

Chose 503 because rejection here is capacity/service unavailability, not request-shape invalidity.
Kept error body compatible with existing gateway error envelope for client consistency.

Testing

Executed:

Focused change (single issue, single PR scope)
Behavior fixed for both stream + non-stream
Regression tests added
cargo fmt --check (workspace-wide; blocked by unrelated files)
cargo clippy --workspace --all-features -- -D errors (pre-existing unrelated warnings/errors)
Architecture layer boundaries respected

———

Bhanudahiyaa · 2026-03-17T22:07:22Z

@lijingrs @yangrudan @BH3GEI

This PR corrects the gateway’s handling of orchestrator rejections by returning a 503 Service Unavailable instead of a 200 completion response.

Key updates

Applied consistent rejection handling for both streaming and non-streaming paths
Introduced a centralized helper for structured error responses
Preserved diagnostic headers (x-mofa-backend, x-mofa-latency-ms)
Added regression tests to enforce correct HTTP semantics

Rationale

Rejections represent capacity/admission failures, not successful inference. Returning 200 breaks client retry logic and observability.

Would appreciate confirmation that 503 is the appropriate status for this case, or if a different mapping is preferred.

fix(gateway): return 503 for orchestrator rejections

ac7dd98

Bhanudahiyaa force-pushed the fix/openai-rejection-http-semantics branch from 0050d5b to ac7dd98 Compare March 17, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): return 503 for orchestrator rejections#1340

fix(gateway): return 503 for orchestrator rejections#1340
Bhanudahiyaa wants to merge 1 commit intomofa-org:mainfrom
Bhanudahiyaa:fix/openai-rejection-http-semantics

Bhanudahiyaa commented Mar 17, 2026 •

edited

Loading

Uh oh!

Bhanudahiyaa commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bhanudahiyaa commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

fixes: #1339

Changes

Design / Tradeoffs

Testing

Uh oh!

Bhanudahiyaa commented Mar 17, 2026

Key updates

Rationale

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bhanudahiyaa commented Mar 17, 2026 •

edited

Loading