OpenAI gateway silently drops max_tokens and temperature before inference routing/execution

## Problem

  The OpenAI-compatible request schema includes max_tokens and temperature, and docs/comments indicate these are passed through to backend inference behavior.
  However, current gateway flow constructs internal InferenceRequest without these fields, so both values are dropped silently.

  ## Why this matters

  - Client intent is lost without warning.
  - Runtime behavior can diverge from OpenAI-compatible expectations.
  - Troubleshooting is harder because requests appear accepted but generation controls are ignored.
  - This weakens trust in API compatibility and can cause production regressions for callers relying on token caps / deterministic temperature settings.

  ## Expected behavior

  When max_tokens or temperature are provided in OpenAI-compatible requests, they should be propagated into internal inference request types so downstream orchestrator/provider
  layers can consume them.

  ## Actual behavior

  Gateway paths currently build InferenceRequest from model/prompt/priority only; generation params are not preserved.

  ## Scope

  - Extend internal request contract with optional generation params.
  - Propagate from API request -> internal inference request in all gateway request paths.
  - Add regression tests proving no silent parameter drop.

  ———


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI gateway silently drops max_tokens and temperature before inference routing/execution #1341

Problem

Why this matters

Expected behavior

Actual behavior

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenAI gateway silently drops max_tokens and temperature before inference routing/execution #1341

Description

Problem

Why this matters

Expected behavior

Actual behavior

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions