-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
Summary
When using the responses.parse API with the web_search_preview tool, the
response frequently contains control characters and gets truncated, causing
JSON parsing failures. This occurs specifically when generating search queries
that may include non-ASCII content.
Environment
- OpenAI Python SDK Version: 1.95.0
- Python Version: 3.12+
- API Model: gpt-4.1
- Endpoint: client.responses.parse
Impact
- JSON parsing fails with ValidationError
- The web_search_preview tool becomes unusable for queries involving
non-English content - Responses are truncated around 3.5-4.5KB, suggesting a buffer overflow issue
Workaround
We currently retry without the web_search_preview tool when these errors occur,
which succeeds but loses the web search functionality.
Suggested Fix
- Ensure web search results are properly sanitized to remove control
characters - Fix the response buffer size to prevent truncation
- Properly encode non-ASCII characters as UTF-8 instead of malformed escape
sequences
To Reproduce
Steps to Reproduce
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
from typing import List, Optional
class Subqueries(BaseModel):
subqueries: List[str] = Field(description="List of search queries")
hl: Optional[str] = Field(default=None, description="Language code")
gl: Optional[str] = Field(default=None, description="Country code")
async_client = AsyncOpenAI()
response = await async_client.responses.parse(
model="gpt-4.1",
input=[
{
"role": "system",
"content": "Generate search queries for finding content creators"
},
{
"role": "user",
"content": "Find Russian language content creators"
}
],
text_format=Subqueries,
tools=[
{
"type": "web_search_preview",
"user_location": {"type": "approximate"},
"search_context_size": "medium"
}
],
temperature=1
)
Expected Behavior
The API should return valid JSON with properly encoded text, including
non-ASCII characters as valid UTF-8.
Actual Behavior
- Control Character Corruption: Responses contain invalid control characters:
{"subqueries": ["\u0004\u0043\u0043... - Where \u0004 is ASCII control character 4, not valid text.
- Truncation Mid-Escape: Responses get truncated in the middle of escape
sequences:
Invalid JSON: EOF while parsing a string at line 1 column 4587
input_value='{"subqueries": ["\u0017...0043a\u00043e\u00043 ' - Note the incomplete escape sequence at the end.
- Malformed Unicode: Instead of proper UTF-8 encoding for Cyrillic:
- Expected: "Фильмы" (proper UTF-8)
- Actual: "\x04\x024\x038..." (control char + ASCII digits)
Code snippets
OS
macOS
Python version
3.12
Library version
1.95.0