Feature Request: top_logprobs pass-through on LLM Gateway (or explicit error instead of silent drop)

## TL;DR

The Bankr LLM Gateway silently strips `top_logprobs` from requests routed to
models whose upstreams natively support it (verified on `grok-4.20`). Clients
get a successful-looking response with no logprobs field and no warning,
blocking a ~2.5× Brier-score calibration improvement for probability-estimation
workloads.

**Asking for one of two things, in preference order:**

1. **Pass the parameter through** to xAI, OpenAI, DeepSeek, Qwen (the
   OpenAI-compatible upstreams that already expose logprobs natively) and
   return the token-level probabilities on the Anthropic-shaped response.
2. **Minimum viable fix:** return an explicit `400 invalid_request_error`
   when the parameter is unsupported, instead of silently dropping it.
   The Anthropic-family models already do this — please make it uniform.

Full spec, reproducers, upstream support matrix, backwards-compatibility
notes, and arxiv references here:

👉 **https://gist.github.com/dialethia/20261815225aa45dbb4bb0c25b397049**

## Reproducers

**Silent drop (bug) — `grok-4.20`:**
```bash
curl -sS https://llm.bankr.bot/v1/messages \
  -H "x-api-key: $BANKR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"grok-4.20","max_tokens":3,"messages":[{"role":"user","content":"YES or NO:"}],"top_logprobs":3}'
```
Returns HTTP 200 with a normal message response. No `logprobs` field. No warning header.

**Explicit rejection (correct behaviour) — `claude-sonnet-4.6`:**
```bash
curl -sS https://llm.bankr.bot/v1/messages \
  -H "x-api-key: $BANKR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4.6","max_tokens":5,"messages":[{"role":"user","content":"YES or NO:"}],"top_logprobs":5}'
```
Returns `400 invalid_request_error: "top_logprobs: Extra inputs are not permitted"`. Client can detect and fall back.

## Why it matters (briefly)

For probability-estimation workloads (prediction markets, classification,
hallucination detection, RLHF), extracting `log P("YES") / log P("NO")` from
token logprobs achieves **Brier 0.186** ([arxiv:2501.04880](https://arxiv.org/abs/2501.04880))
vs text-parsing at ~0.49. That's a ~2.5× calibration improvement at zero cost
— the upstreams return logprobs for free; the gateway is the bottleneck.

The full spec covers use cases, the upstream support matrix (xAI ✅,
OpenAI ✅, DeepSeek ✅, Anthropic ❌, Gemini ⚠️), and rollout/backwards-compat
options.

Happy to iterate on the design if there's appetite.

---

cc: @0xdeployer @igoryuzo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: top_logprobs pass-through on LLM Gateway (or explicit error instead of silent drop) #11

TL;DR

Reproducers

Why it matters (briefly)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: top_logprobs pass-through on LLM Gateway (or explicit error instead of silent drop) #11

Description

TL;DR

Reproducers

Why it matters (briefly)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions