Skip to content

OpenAI API responses.parse with web_search_preview tool returns corrupted JSON with control characters #2458

@leplik

Description

@leplik

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Summary

When using the responses.parse API with the web_search_preview tool, the
response frequently contains control characters and gets truncated, causing
JSON parsing failures. This occurs specifically when generating search queries
that may include non-ASCII content.

Environment

  • OpenAI Python SDK Version: 1.95.0
  • Python Version: 3.12+
  • API Model: gpt-4.1
  • Endpoint: client.responses.parse

Impact

  • JSON parsing fails with ValidationError
  • The web_search_preview tool becomes unusable for queries involving
    non-English content
  • Responses are truncated around 3.5-4.5KB, suggesting a buffer overflow issue

Workaround

We currently retry without the web_search_preview tool when these errors occur,
which succeeds but loses the web search functionality.

Suggested Fix

  1. Ensure web search results are properly sanitized to remove control
    characters
  2. Fix the response buffer size to prevent truncation
  3. Properly encode non-ASCII characters as UTF-8 instead of malformed escape
    sequences

To Reproduce

Steps to Reproduce

from openai import AsyncOpenAI
from pydantic import BaseModel, Field
from typing import List, Optional

class Subqueries(BaseModel):
subqueries: List[str] = Field(description="List of search queries")
hl: Optional[str] = Field(default=None, description="Language code")
gl: Optional[str] = Field(default=None, description="Country code")

async_client = AsyncOpenAI()

response = await async_client.responses.parse(
model="gpt-4.1",
input=[
{
"role": "system",
"content": "Generate search queries for finding content creators"
},
{
"role": "user",
"content": "Find Russian language content creators"
}
],
text_format=Subqueries,
tools=[
{
"type": "web_search_preview",
"user_location": {"type": "approximate"},
"search_context_size": "medium"
}
],
temperature=1
)

Expected Behavior

The API should return valid JSON with properly encoded text, including
non-ASCII characters as valid UTF-8.

Actual Behavior

  1. Control Character Corruption: Responses contain invalid control characters:
    {"subqueries": ["\u0004\u0043\u0043...
  2. Where \u0004 is ASCII control character 4, not valid text.
  3. Truncation Mid-Escape: Responses get truncated in the middle of escape
    sequences:
    Invalid JSON: EOF while parsing a string at line 1 column 4587
    input_value='{"subqueries": ["\u0017...0043a\u00043e\u00043 '
  4. Note the incomplete escape sequence at the end.
  5. Malformed Unicode: Instead of proper UTF-8 encoding for Cyrillic:
    - Expected: "Фильмы" (proper UTF-8)
    - Actual: "\x04\x024\x038..." (control char + ASCII digits)

Code snippets

OS

macOS

Python version

3.12

Library version

1.95.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions