Skip to content

OpenAI gateway silently drops max_tokens and temperature before inference routing/execution #1341

@Bhanudahiyaa

Description

@Bhanudahiyaa

Problem

The OpenAI-compatible request schema includes max_tokens and temperature, and docs/comments indicate these are passed through to backend inference behavior.
However, current gateway flow constructs internal InferenceRequest without these fields, so both values are dropped silently.

Why this matters

  • Client intent is lost without warning.
  • Runtime behavior can diverge from OpenAI-compatible expectations.
  • Troubleshooting is harder because requests appear accepted but generation controls are ignored.
  • This weakens trust in API compatibility and can cause production regressions for callers relying on token caps / deterministic temperature settings.

Expected behavior

When max_tokens or temperature are provided in OpenAI-compatible requests, they should be propagated into internal inference request types so downstream orchestrator/provider
layers can consume them.

Actual behavior

Gateway paths currently build InferenceRequest from model/prompt/priority only; generation params are not preserved.

Scope

  • Extend internal request contract with optional generation params.
  • Propagate from API request -> internal inference request in all gateway request paths.
  • Add regression tests proving no silent parameter drop.

———

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions