-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat: preserve logprobs from chat completions API in ModelResponse #2134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JRMeyer
wants to merge
1
commit into
openai:main
Choose a base branch
from
JRMeyer:feat/preserve-logprobs-chat-completions
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: preserve logprobs from chat completions API in ModelResponse #2134
JRMeyer
wants to merge
1
commit into
openai:main
from
JRMeyer:feat/preserve-logprobs-chat-completions
+12
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The SDK already accepts `top_logprobs` in ModelSettings and passes it to the API, but the logprobs returned in the response were discarded during conversion. This change: 1. Adds an optional `logprobs` field to ModelResponse dataclass 2. Extracts logprobs from `choice.logprobs.content` in the chat completions model and includes them in the ModelResponse This enables use cases like RLHF training, confidence scoring, and uncertainty estimation that require access to token-level log probabilities.
seratch
requested changes
Dec 3, 2025
Member
seratch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending this patch and indeed this works. We'd like to make the behavior consistent with Responses API, so can you make more changes like this?
diff --git a/src/agents/items.py b/src/agents/items.py
index 76be512..991a7f8 100644
--- a/src/agents/items.py
+++ b/src/agents/items.py
@@ -356,13 +356,6 @@ class ModelResponse:
be passed to `Runner.run`.
"""
- logprobs: list[Any] | None = None
- """Token log probabilities from the model response.
- Only populated when using the chat completions API with `top_logprobs` set in ModelSettings.
- Each element corresponds to a token and contains the token string, log probability, and
- optionally the top alternative tokens with their log probabilities.
- """
-
def to_input_items(self) -> list[TResponseInputItem]:
"""Convert the output into a list of input items suitable for passing to the model."""
# We happen to know that the shape of the Pydantic output items are the same as the
diff --git a/src/agents/models/chatcmpl_helpers.py b/src/agents/models/chatcmpl_helpers.py
index 335e3f5..58e3b15 100644
--- a/src/agents/models/chatcmpl_helpers.py
+++ b/src/agents/models/chatcmpl_helpers.py
@@ -3,6 +3,12 @@ from __future__ import annotations
from contextvars import ContextVar
from openai import AsyncOpenAI
+from openai.types.chat.chat_completion_token_logprob import ChatCompletionTokenLogprob
+from openai.types.responses.response_output_text import Logprob, LogprobTopLogprob
+from openai.types.responses.response_text_delta_event import (
+ Logprob as DeltaLogprob,
+ LogprobTopLogprob as DeltaTopLogprob,
+)
from ..model_settings import ModelSettings
from ..version import __version__
@@ -41,3 +47,60 @@ class ChatCmplHelpers:
)
stream_options = {"include_usage": include_usage} if include_usage is not None else None
return stream_options
+
+ @classmethod
+ def convert_logprobs_for_output_text(
+ cls, logprobs: list[ChatCompletionTokenLogprob] | None
+ ) -> list[Logprob] | None:
+ """
+ Convert chat completion token logprobs into the Responses `Logprob` shape.
+ """
+ if not logprobs:
+ return None
+
+ converted: list[Logprob] = []
+ for token_logprob in logprobs:
+ converted.append(
+ Logprob(
+ token=token_logprob.token,
+ logprob=token_logprob.logprob,
+ bytes=token_logprob.bytes or [],
+ top_logprobs=[
+ LogprobTopLogprob(
+ token=top_logprob.token,
+ logprob=top_logprob.logprob,
+ bytes=top_logprob.bytes or [],
+ )
+ for top_logprob in token_logprob.top_logprobs
+ ],
+ )
+ )
+ return converted
+
+ @classmethod
+ def convert_logprobs_for_text_delta(
+ cls, logprobs: list[ChatCompletionTokenLogprob] | None
+ ) -> list[DeltaLogprob] | None:
+ """
+ Convert chat completion token logprobs into the `ResponseTextDeltaEvent` logprob shape.
+ """
+ if not logprobs:
+ return None
+
+ converted: list[DeltaLogprob] = []
+ for token_logprob in logprobs:
+ converted.append(
+ DeltaLogprob(
+ token=token_logprob.token,
+ logprob=token_logprob.logprob,
+ top_logprobs=[
+ DeltaTopLogprob(
+ token=top_logprob.token,
+ logprob=top_logprob.logprob,
+ )
+ for top_logprob in token_logprob.top_logprobs
+ ]
+ or None,
+ )
+ )
+ return converted
diff --git a/src/agents/models/chatcmpl_stream_handler.py b/src/agents/models/chatcmpl_stream_handler.py
index f1c5049..e2cd060 100644
--- a/src/agents/models/chatcmpl_stream_handler.py
+++ b/src/agents/models/chatcmpl_stream_handler.py
@@ -42,6 +42,7 @@ from openai.types.responses.response_reasoning_text_done_event import (
from openai.types.responses.response_usage import InputTokensDetails, OutputTokensDetails
from ..items import TResponseStreamEvent
+from .chatcmpl_helpers import ChatCmplHelpers
from .fake_id import FAKE_RESPONSES_ID
@@ -105,6 +106,7 @@ class ChatCmplStreamHandler:
continue
delta = chunk.choices[0].delta
+ choice_logprobs = chunk.choices[0].logprobs
# Handle thinking blocks from Anthropic (for preserving signatures)
if hasattr(delta, "thinking_blocks") and delta.thinking_blocks:
@@ -266,6 +268,12 @@ class ChatCmplStreamHandler:
type="response.content_part.added",
sequence_number=sequence_number.get_and_increment(),
)
+ delta_logprobs = ChatCmplHelpers.convert_logprobs_for_text_delta(
+ choice_logprobs.content if choice_logprobs else None
+ ) or []
+ output_logprobs = ChatCmplHelpers.convert_logprobs_for_output_text(
+ choice_logprobs.content if choice_logprobs else None
+ )
# Emit the delta for this segment of content
yield ResponseTextDeltaEvent(
content_index=state.text_content_index_and_output[0],
@@ -275,10 +283,15 @@ class ChatCmplStreamHandler:
is not None, # fixed 0 -> 0 or 1
type="response.output_text.delta",
sequence_number=sequence_number.get_and_increment(),
- logprobs=[],
+ logprobs=delta_logprobs,
)
# Accumulate the text into the response part
state.text_content_index_and_output[1].text += delta.content
+ if output_logprobs:
+ existing_logprobs = state.text_content_index_and_output[1].logprobs or []
+ state.text_content_index_and_output[1].logprobs = (
+ existing_logprobs + output_logprobs
+ )
# Handle refusals (model declines to answer)
# This is always set by the OpenAI API, but not by others e.g. LiteLLM
diff --git a/src/agents/models/openai_chatcompletions.py b/src/agents/models/openai_chatcompletions.py
index ac696db..1190e76 100644
--- a/src/agents/models/openai_chatcompletions.py
+++ b/src/agents/models/openai_chatcompletions.py
@@ -9,7 +9,13 @@ from openai import AsyncOpenAI, AsyncStream, Omit, omit
from openai.types import ChatModel
from openai.types.chat import ChatCompletion, ChatCompletionChunk, ChatCompletionMessage
from openai.types.chat.chat_completion import Choice
-from openai.types.responses import Response
+from openai.types.responses import (
+ Response,
+ ResponseOutputItem,
+ ResponseOutputMessage,
+ ResponseOutputText,
+)
+from openai.types.responses.response_output_text import Logprob
from openai.types.responses.response_prompt_param import ResponsePromptParam
from openai.types.responses.response_usage import InputTokensDetails, OutputTokensDetails
@@ -129,17 +135,34 @@ class OpenAIChatCompletionsModel(Model):
items = Converter.message_to_output_items(message) if message is not None else []
- logprobs_data = None
+ logprob_models = None
if first_choice and first_choice.logprobs and first_choice.logprobs.content:
- logprobs_data = [lp.model_dump() for lp in first_choice.logprobs.content]
+ logprob_models = ChatCmplHelpers.convert_logprobs_for_output_text(
+ first_choice.logprobs.content
+ )
+
+ if logprob_models:
+ self._attach_logprobs_to_output(items, logprob_models)
return ModelResponse(
output=items,
usage=usage,
response_id=None,
- logprobs=logprobs_data,
)
+ def _attach_logprobs_to_output(
+ self, output_items: list[ResponseOutputItem], logprobs: list[Logprob]
+ ) -> None:
+ """Attach logprobs to the first assistant text content part."""
+ for output_item in output_items:
+ if not isinstance(output_item, ResponseOutputMessage):
+ continue
+
+ for content in output_item.content:
+ if isinstance(content, ResponseOutputText):
+ content.logprobs = logprobs
+ return
+
async def stream_response(
self,
system_instructions: str | None,
diff --git a/tests/test_openai_chatcompletions.py b/tests/test_openai_chatcompletions.py
index 3a0f753..78b2cd3 100644
--- a/tests/test_openai_chatcompletions.py
+++ b/tests/test_openai_chatcompletions.py
@@ -6,13 +6,17 @@ from typing import Any
import httpx
import pytest
from openai import AsyncOpenAI, omit
-from openai.types.chat.chat_completion import ChatCompletion, Choice
+from openai.types.chat.chat_completion import ChatCompletion, Choice, ChoiceLogprobs
from openai.types.chat.chat_completion_chunk import ChatCompletionChunk
from openai.types.chat.chat_completion_message import ChatCompletionMessage
from openai.types.chat.chat_completion_message_tool_call import ( # type: ignore[attr-defined]
ChatCompletionMessageFunctionToolCall,
Function,
)
+from openai.types.chat.chat_completion_token_logprob import (
+ ChatCompletionTokenLogprob,
+ TopLogprob,
+)
from openai.types.completion_usage import (
CompletionUsage,
PromptTokensDetails,
@@ -98,6 +102,68 @@ async def test_get_response_with_text_message(monkeypatch) -> None:
assert resp.response_id is None
+@pytest.mark.allow_call_model_methods
+@pytest.mark.asyncio
+async def test_get_response_attaches_logprobs(monkeypatch) -> None:
+ """
+ Chat completions logprobs should be copied onto the output text part.
+ """
+ msg = ChatCompletionMessage(role="assistant", content="Hi!")
+ choice = Choice(
+ index=0,
+ finish_reason="stop",
+ message=msg,
+ logprobs=ChoiceLogprobs(
+ content=[
+ ChatCompletionTokenLogprob(
+ token="Hi",
+ logprob=-0.5,
+ bytes=[1],
+ top_logprobs=[TopLogprob(token="Hi", logprob=-0.5, bytes=[1])],
+ ),
+ ChatCompletionTokenLogprob(
+ token="!",
+ logprob=-0.1,
+ bytes=[2],
+ top_logprobs=[TopLogprob(token="!", logprob=-0.1, bytes=[2])],
+ ),
+ ]
+ ),
+ )
+ chat = ChatCompletion(
+ id="resp-id",
+ created=0,
+ model="fake",
+ object="chat.completion",
+ choices=[choice],
+ usage=None,
+ )
+
+ async def patched_fetch_response(self, *args, **kwargs):
+ return chat
+
+ monkeypatch.setattr(OpenAIChatCompletionsModel, "_fetch_response", patched_fetch_response)
+ model = OpenAIProvider(use_responses=False).get_model("gpt-4")
+ resp: ModelResponse = await model.get_response(
+ system_instructions=None,
+ input="",
+ model_settings=ModelSettings(),
+ tools=[],
+ output_schema=None,
+ handoffs=[],
+ tracing=ModelTracing.DISABLED,
+ previous_response_id=None,
+ conversation_id=None,
+ prompt=None,
+ )
+ assert len(resp.output) == 1
+ assert isinstance(resp.output[0], ResponseOutputMessage)
+ text_part = resp.output[0].content[0]
+ assert isinstance(text_part, ResponseOutputText)
+ assert text_part.logprobs is not None
+ assert [lp.token for lp in text_part.logprobs] == ["Hi", "!"]
+
+
@pytest.mark.allow_call_model_methods
@pytest.mark.asyncio
async def test_get_response_with_refusal(monkeypatch) -> None:
diff --git a/tests/test_openai_chatcompletions_stream.py b/tests/test_openai_chatcompletions_stream.py
index 947816f..083448c 100644
--- a/tests/test_openai_chatcompletions_stream.py
+++ b/tests/test_openai_chatcompletions_stream.py
@@ -7,6 +7,11 @@ from openai.types.chat.chat_completion_chunk import (
ChoiceDelta,
ChoiceDeltaToolCall,
ChoiceDeltaToolCallFunction,
+ ChoiceLogprobs,
+)
+from openai.types.chat.chat_completion_token_logprob import (
+ ChatCompletionTokenLogprob,
+ TopLogprob,
)
from openai.types.completion_usage import (
CompletionTokensDetails,
@@ -15,6 +20,7 @@ from openai.types.completion_usage import (
)
from openai.types.responses import (
Response,
+ ResponseCompletedEvent,
ResponseFunctionToolCall,
ResponseOutputMessage,
ResponseOutputRefusal,
@@ -128,6 +134,119 @@ async def test_stream_response_yields_events_for_text_content(monkeypatch) -> No
assert completed_resp.usage.output_tokens_details.reasoning_tokens == 3
+@pytest.mark.allow_call_model_methods
+@pytest.mark.asyncio
+async def test_stream_response_includes_logprobs(monkeypatch) -> None:
+ """
+ Streaming chat completions logprobs should be forwarded into text delta events and the
+ accumulated output text part.
+ """
+ chunk1 = ChatCompletionChunk(
+ id="chunk-id",
+ created=1,
+ model="fake",
+ object="chat.completion.chunk",
+ choices=[
+ Choice(
+ index=0,
+ delta=ChoiceDelta(content="Hi"),
+ logprobs=ChoiceLogprobs(
+ content=[
+ ChatCompletionTokenLogprob(
+ token="Hi",
+ logprob=-0.5,
+ bytes=[1],
+ top_logprobs=[TopLogprob(token="Hi", logprob=-0.5, bytes=[1])],
+ )
+ ]
+ ),
+ )
+ ],
+ )
+ chunk2 = ChatCompletionChunk(
+ id="chunk-id",
+ created=1,
+ model="fake",
+ object="chat.completion.chunk",
+ choices=[
+ Choice(
+ index=0,
+ delta=ChoiceDelta(content=" there"),
+ logprobs=ChoiceLogprobs(
+ content=[
+ ChatCompletionTokenLogprob(
+ token=" there",
+ logprob=-0.25,
+ bytes=[2],
+ top_logprobs=[TopLogprob(token=" there", logprob=-0.25, bytes=[2])],
+ )
+ ]
+ ),
+ )
+ ],
+ usage=CompletionUsage(
+ completion_tokens=5,
+ prompt_tokens=7,
+ total_tokens=12,
+ prompt_tokens_details=PromptTokensDetails(cached_tokens=2),
+ completion_tokens_details=CompletionTokensDetails(reasoning_tokens=3),
+ ),
+ )
+
+ async def fake_stream() -> AsyncIterator[ChatCompletionChunk]:
+ for c in (chunk1, chunk2):
+ yield c
+
+ async def patched_fetch_response(self, *args, **kwargs):
+ resp = Response(
+ id="resp-id",
+ created_at=0,
+ model="fake-model",
+ object="response",
+ output=[],
+ tool_choice="none",
+ tools=[],
+ parallel_tool_calls=False,
+ )
+ return resp, fake_stream()
+
+ monkeypatch.setattr(OpenAIChatCompletionsModel, "_fetch_response", patched_fetch_response)
+ model = OpenAIProvider(use_responses=False).get_model("gpt-4")
+ output_events = []
+ async for event in model.stream_response(
+ system_instructions=None,
+ input="",
+ model_settings=ModelSettings(),
+ tools=[],
+ output_schema=None,
+ handoffs=[],
+ tracing=ModelTracing.DISABLED,
+ previous_response_id=None,
+ conversation_id=None,
+ prompt=None,
+ ):
+ output_events.append(event)
+
+ text_delta_events = [
+ event for event in output_events if event.type == "response.output_text.delta"
+ ]
+ assert len(text_delta_events) == 2
+ assert [lp.token for lp in text_delta_events[0].logprobs] == ["Hi"]
+ assert [lp.token for lp in text_delta_events[1].logprobs] == [" there"]
+
+ completed_event = next(
+ event for event in output_events if event.type == "response.completed"
+ )
+ assert isinstance(completed_event, ResponseCompletedEvent)
+ completed_resp = completed_event.response
+ assert isinstance(completed_resp.output[0], ResponseOutputMessage)
+ text_part = completed_resp.output[0].content[0]
+ assert isinstance(text_part, ResponseOutputText)
+ assert text_part.text == "Hi there"
+ assert text_part.logprobs is not None
+ assert [lp.token for lp in text_part.logprobs] == ["Hi", " there"]
+
+
@pytest.mark.allow_call_model_methods
@pytest.mark.asyncio
async def test_stream_response_yields_events_for_refusal_content(monkeypatch) -> None:You can verify the behavior using this script:
import asyncio
import json
from openai import AsyncOpenAI
from openai.types.responses import ResponseOutputMessage, ResponseOutputText
from openai.types.responses.response_output_text import Logprob
from agents import ItemHelpers, ModelSettings, ModelTracing, OpenAIChatCompletionsModel
async def main():
client = AsyncOpenAI()
model = OpenAIChatCompletionsModel(
model="gpt-5.1",
openai_client=client,
)
response = await model.get_response(
system_instructions="You are a concise assistant.",
input="List two prime numbers under ten.",
model_settings=ModelSettings(
top_logprobs=3,
extra_args={"logprobs": True},
),
tools=[],
handoffs=[],
output_schema=None,
tracing=ModelTracing.DISABLED,
previous_response_id=None,
conversation_id=None,
prompt=None,
)
message_text = ItemHelpers.extract_last_text(response.output[0]) if response.output else ""
print(f"Model text: {message_text}")
text_part_logprobs: list[Logprob] | None = None
for output_item in response.output:
if isinstance(output_item, ResponseOutputMessage):
for content in output_item.content:
if isinstance(content, ResponseOutputText):
text_part_logprobs = content.logprobs
break
if text_part_logprobs:
break
if text_part_logprobs:
print("Token logprobs (token: logprob [top alternatives]):")
for entry in text_part_logprobs:
token = entry.token or ""
logprob = entry.logprob
top_alts = entry.top_logprobs or []
alt_preview_parts: list[str] = []
for alt in top_alts[:3]:
alt_token = alt.token or ""
alt_logprob = alt.logprob
if isinstance(alt_logprob, (int, float)):
alt_preview_parts.append(f"{alt_token} ({alt_logprob:.2f})")
alt_preview_str = ", ".join(alt_preview_parts)
logprob_display = f"{logprob:.2f}" if isinstance(logprob, (int, float)) else "unknown"
suffix = f" [{alt_preview_str}]" if alt_preview_str else ""
print(f" {token!r}: {logprob_display}{suffix}")
else:
print(
"No logprobs were returned. Check that the model supports logprobs and that"
" top_logprobs was set."
)
if text_part_logprobs:
print("\nRaw logprobs payload:")
serializable = [entry.model_dump() for entry in text_part_logprobs]
print(json.dumps(serializable[:5], indent=2))
if __name__ == "__main__":
asyncio.run(main())
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The SDK already accepts
top_logprobsin ModelSettings and passes it to the API, but the logprobs returned in the response were discarded during conversion. This change:logprobsfield to ModelResponse dataclasschoice.logprobs.contentin the chat completions model and includes them in the ModelResponseThis enables use cases like RLHF training, confidence scoring, and uncertainty estimation that require access to token-level log probabilities.