Skip to content

Comments

feat: use structured JSON output API with fallback#250

Open
Anionex wants to merge 3 commits intomainfrom
feat/structured-json-output
Open

feat: use structured JSON output API with fallback#250
Anionex wants to merge 3 commits intomainfrom
feat/structured-json-output

Conversation

@Anionex
Copy link
Owner

@Anionex Anionex commented Feb 19, 2026

Summary

  • Replace manual text+parse+retry JSON generation with Google's native structured JSON output API (Pydantic model + response_schema)
  • Models that don't support structured output automatically fall back to the existing retry-based approach
  • All 7 generate_json callers and 2 generate_json_with_image callers updated

File Changes

  • backend/services/ai_schemas.py (new) — Pydantic models for all JSON response schemas
  • backend/services/ai_providers/text/base.py — Added generate_json / generate_json_with_image default methods to TextProvider
  • backend/services/ai_providers/text/genai_provider.py — Structured output implementation for GenAI provider
  • backend/services/ai_service.py — Refactored generate_json / generate_json_with_image with structured-first + fallback pattern
  • backend/services/image_editability/text_attribute_extractors.py — Pass schemas to generate_json_with_image calls

E2E Test Coverage

  • frontend/e2e/structured-json-output.spec.ts — Mock test verifying outline generation flow works with structured JSON output

Replace manual text+parse+retry JSON generation with Google's native
structured JSON output API (Pydantic model + response_schema). Models
that don't support structured output automatically fall back to the
existing retry-based approach.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Anionex, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and efficiency of AI model interactions by transitioning to a structured JSON output approach. By prioritizing native API support for schema-driven JSON generation and providing a robust fallback for unsupported models, the system ensures consistent and valid data processing. This change streamlines the integration of AI responses, reducing the need for error-prone text parsing and improving overall system stability.

Highlights

  • Structured JSON Output: Replaced manual text parsing and retry logic for JSON generation with Google's native structured JSON output API, leveraging Pydantic models for schema enforcement.
  • Fallback Mechanism: Implemented an automatic fallback to the existing text-parsing-and-retry approach for AI models that do not natively support structured JSON output.
  • API Call Updates: Updated all 7 generate_json and 2 generate_json_with_image callers across the codebase to utilize the new structured-first with fallback pattern.
  • New Pydantic Schemas: Introduced a new file (backend/services/ai_schemas.py) to house Pydantic models for various AI JSON response schemas, ensuring type safety and validation.
  • E2E Test Coverage: Added a new end-to-end test (frontend/e2e/structured-json-output.spec.ts) to verify the outline generation flow works correctly with the new structured JSON output implementation.
Changelog
  • backend/services/ai_providers/text/base.py
    • Imported Optional, Type, and BaseModel.
    • Added default generate_json and generate_json_with_image methods to the TextProvider abstract base class.
  • backend/services/ai_providers/text/genai_provider.py
    • Imported Type, Optional, and BaseModel.
    • Implemented _build_json_config to create structured JSON output configurations.
    • Added generate_json and generate_json_with_image methods to support structured JSON generation using the GenAI provider's native capabilities.
  • backend/services/ai_schemas.py
    • Added new file to define Pydantic models for AI JSON response schemas.
  • backend/services/ai_service.py
    • Removed a TODO comment related to structured output.
    • Imported Callable, Type, BaseModel, and specific Pydantic schemas.
    • Refactored generate_json and generate_json_with_image methods to implement a structured-first approach with a fallback to text parsing and retry.
    • Introduced _generate_json_fallback and _generate_json_with_image_fallback for handling non-structured output scenarios.
    • Added static methods _extract_outline and _extract_descriptions to convert Pydantic models to the expected dictionary/list formats.
    • Updated calls to generate_json in generate_outline, parse_outline_text, parse_description_to_outline, parse_description_to_page_descriptions, refine_outline, refine_descriptions, and extract_page_content to pass response_schema and extract_fn arguments.
  • backend/services/image_editability/text_attribute_extractors.py
    • Updated calls to ai_service.generate_json_with_image in _call_vision_model and extract_batch_with_full_image to include response_schema and extract_fn for structured output.
  • frontend/e2e/structured-json-output.spec.ts
    • Added a new E2E test file to verify the outline generation flow with structured JSON output by mocking API responses and asserting the expected behavior.
Activity
  • The pull request introduces a significant architectural change to how AI model responses are handled, moving from manual parsing to structured output.
  • The author provided a clear summary of the changes and their impact in the pull request description.
  • New Pydantic models were created to define the expected structure of AI-generated JSON, improving type safety and data validation.
  • Existing AI service methods were refactored to gracefully handle both structured output and fallback scenarios.
  • A new E2E test was added to ensure the core functionality (outline generation) remains robust with the new structured JSON output implementation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the JSON generation logic to use Google's structured JSON output API, with a robust fallback to the previous text-based generation and parsing method. The introduction of Pydantic models in ai_schemas.py for response validation is a great improvement for type safety and data integrity. The changes in ai_service.py are well-structured, separating the new structured output logic from the fallback mechanism.

I've found a bug in the fallback logic for JSON cleaning. Please see my detailed comments.

@Anionex
Copy link
Owner Author

Anionex commented Feb 19, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great pull request that significantly improves the reliability of JSON generation by adopting the structured output API from Google's GenAI SDK. The introduction of Pydantic models for response schemas is a clean way to enforce and validate the data structures. The fallback mechanism to the existing text-parsing-with-retry approach for providers that don't support structured output is a robust design choice. The refactoring is well-executed across the AIService and providers, and all callers have been updated accordingly. The addition of a new E2E test to verify the flow is also a valuable contribution. I have a few minor suggestions to improve code style and consistency.

@Anionex
Copy link
Owner Author

Anionex commented Feb 19, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the system by replacing manual JSON parsing with a structured output API and a fallback mechanism, utilizing Pydantic schemas in ai_schemas.py and implementing them in the GenAI provider, with refactoring in ai_service.py. This structured approach is a positive security enhancement as it enforces schemas on LLM outputs. However, two instances of Indirect Prompt Injection were identified where untrusted content from documents or PPT elements is directly embedded into LLM prompts without sufficient sanitization or escaping, which could allow an attacker to manipulate the LLM's behavior. Additionally, there is an issue with an unused parameter that makes an API misleading.

Comment on lines +196 to +198
def generate_json(self, prompt: str, thinking_budget: int = 1000,
response_schema: Type[BaseModel] = None,
extract_fn: Callable = None) -> Union[Dict, List]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The thinking_budget parameter is declared but its value is ignored within the function. The actual_budget is unconditionally determined by self._get_text_thinking_budget(), which relies on the global application configuration. This is misleading for callers who might expect their provided thinking_budget to be used. This issue also applies to the generate_json_with_image method.

To make the code clearer and less error-prone, I recommend removing the thinking_budget parameter from the signatures of both generate_json and generate_json_with_image, and updating all their call sites accordingly. This will make it explicit that the thinking budget is controlled centrally via configuration.

Suggested change
def generate_json(self, prompt: str, thinking_budget: int = 1000,
response_schema: Type[BaseModel] = None,
extract_fn: Callable = None) -> Union[Dict, List]:
def generate_json(self, prompt: str,
response_schema: Type[BaseModel] = None,
extract_fn: Callable = None) -> Union[Dict, List]:

Comment on lines 724 to +725
prompt = get_ppt_page_content_extraction_prompt(markdown_text, language=language)
result = self.generate_json(prompt, thinking_budget=1000)
result = self.generate_json(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The extract_page_content method takes markdown_text (extracted from a document) and embeds it directly into an LLM prompt. If the document contains malicious instructions (Indirect Prompt Injection), an attacker can manipulate the LLM's output, potentially controlling the extracted slide content (title, points, description). This could lead to the generation of malicious or misleading PPT content without the user's direct intent. Consider sanitizing the input or using more robust delimiters that are escaped in the input.

Comment on lines 490 to 494
prompt = get_batch_text_attribute_extraction_prompt(text_elements_json)

# 调用 ai_service.generate_json_with_image(带重试机制
# 调用 ai_service.generate_json_with_image(优先结构化输出,回退到重试
try:
result = self.ai_service.generate_json_with_image(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The extract_batch_with_full_image method embeds text content from PPT elements into an LLM prompt using a JSON block. An attacker can provide a PPT with malicious text content designed to break out of the JSON structure (e.g., using ```) and inject arbitrary instructions. This is an Indirect Prompt Injection that allows manipulating the extracted style attributes of the PPT elements. Consider escaping the triple backticks in the input content before embedding it in the prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant