TL;DR
The Bankr LLM Gateway silently strips top_logprobs from requests routed to
models whose upstreams natively support it (verified on grok-4.20). Clients
get a successful-looking response with no logprobs field and no warning,
blocking a ~2.5× Brier-score calibration improvement for probability-estimation
workloads.
Asking for one of two things, in preference order:
- Pass the parameter through to xAI, OpenAI, DeepSeek, Qwen (the
OpenAI-compatible upstreams that already expose logprobs natively) and
return the token-level probabilities on the Anthropic-shaped response.
- Minimum viable fix: return an explicit
400 invalid_request_error
when the parameter is unsupported, instead of silently dropping it.
The Anthropic-family models already do this — please make it uniform.
Full spec, reproducers, upstream support matrix, backwards-compatibility
notes, and arxiv references here:
👉 https://gist.github.com/dialethia/20261815225aa45dbb4bb0c25b397049
Reproducers
Silent drop (bug) — grok-4.20:
curl -sS https://llm.bankr.bot/v1/messages \
-H "x-api-key: $BANKR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"grok-4.20","max_tokens":3,"messages":[{"role":"user","content":"YES or NO:"}],"top_logprobs":3}'
Returns HTTP 200 with a normal message response. No logprobs field. No warning header.
Explicit rejection (correct behaviour) — claude-sonnet-4.6:
curl -sS https://llm.bankr.bot/v1/messages \
-H "x-api-key: $BANKR_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet-4.6","max_tokens":5,"messages":[{"role":"user","content":"YES or NO:"}],"top_logprobs":5}'
Returns 400 invalid_request_error: "top_logprobs: Extra inputs are not permitted". Client can detect and fall back.
Why it matters (briefly)
For probability-estimation workloads (prediction markets, classification,
hallucination detection, RLHF), extracting log P("YES") / log P("NO") from
token logprobs achieves Brier 0.186 (arxiv:2501.04880)
vs text-parsing at ~0.49. That's a ~2.5× calibration improvement at zero cost
— the upstreams return logprobs for free; the gateway is the bottleneck.
The full spec covers use cases, the upstream support matrix (xAI ✅,
OpenAI ✅, DeepSeek ✅, Anthropic ❌, Gemini ⚠️), and rollout/backwards-compat
options.
Happy to iterate on the design if there's appetite.
cc: @0xdeployer @igoryuzo
TL;DR
The Bankr LLM Gateway silently strips
top_logprobsfrom requests routed tomodels whose upstreams natively support it (verified on
grok-4.20). Clientsget a successful-looking response with no logprobs field and no warning,
blocking a ~2.5× Brier-score calibration improvement for probability-estimation
workloads.
Asking for one of two things, in preference order:
OpenAI-compatible upstreams that already expose logprobs natively) and
return the token-level probabilities on the Anthropic-shaped response.
400 invalid_request_errorwhen the parameter is unsupported, instead of silently dropping it.
The Anthropic-family models already do this — please make it uniform.
Full spec, reproducers, upstream support matrix, backwards-compatibility
notes, and arxiv references here:
👉 https://gist.github.com/dialethia/20261815225aa45dbb4bb0c25b397049
Reproducers
Silent drop (bug) —
grok-4.20:Returns HTTP 200 with a normal message response. No
logprobsfield. No warning header.Explicit rejection (correct behaviour) —
claude-sonnet-4.6:Returns
400 invalid_request_error: "top_logprobs: Extra inputs are not permitted". Client can detect and fall back.Why it matters (briefly)
For probability-estimation workloads (prediction markets, classification,
hallucination detection, RLHF), extracting
log P("YES") / log P("NO")fromtoken logprobs achieves Brier 0.186 (arxiv:2501.04880)
vs text-parsing at ~0.49. That's a ~2.5× calibration improvement at zero cost
— the upstreams return logprobs for free; the gateway is the bottleneck.
The full spec covers use cases, the upstream support matrix (xAI ✅,⚠️ ), and rollout/backwards-compat
OpenAI ✅, DeepSeek ✅, Anthropic ❌, Gemini
options.
Happy to iterate on the design if there's appetite.
cc: @0xdeployer @igoryuzo