feat: use structured JSON output API with fallback by Anionex · Pull Request #250 · Anionex/banana-slides

Anionex · 2026-02-19T14:34:43Z

Summary

Replace manual text+parse+retry JSON generation with Google's native structured JSON output API (Pydantic model + response_schema)
Models that don't support structured output automatically fall back to the existing retry-based approach
All 7 generate_json callers and 2 generate_json_with_image callers updated

File Changes

backend/services/ai_schemas.py (new) — Pydantic models for all JSON response schemas
backend/services/ai_providers/text/base.py — Added generate_json / generate_json_with_image default methods to TextProvider
backend/services/ai_providers/text/genai_provider.py — Structured output implementation for GenAI provider
backend/services/ai_service.py — Refactored generate_json / generate_json_with_image with structured-first + fallback pattern
backend/services/image_editability/text_attribute_extractors.py — Pass schemas to generate_json_with_image calls

E2E Test Coverage

frontend/e2e/structured-json-output.spec.ts — Mock test verifying outline generation flow works with structured JSON output

Replace manual text+parse+retry JSON generation with Google's native structured JSON output API (Pydantic model + response_schema). Models that don't support structured output automatically fall back to the existing retry-based approach.

gemini-code-assist · 2026-02-19T14:35:04Z

Summary of Changes

Hello @Anionex, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and efficiency of AI model interactions by transitioning to a structured JSON output approach. By prioritizing native API support for schema-driven JSON generation and providing a robust fallback for unsupported models, the system ensures consistent and valid data processing. This change streamlines the integration of AI responses, reducing the need for error-prone text parsing and improving overall system stability.

Highlights

Structured JSON Output: Replaced manual text parsing and retry logic for JSON generation with Google's native structured JSON output API, leveraging Pydantic models for schema enforcement.
Fallback Mechanism: Implemented an automatic fallback to the existing text-parsing-and-retry approach for AI models that do not natively support structured JSON output.
API Call Updates: Updated all 7 generate_json and 2 generate_json_with_image callers across the codebase to utilize the new structured-first with fallback pattern.
New Pydantic Schemas: Introduced a new file (backend/services/ai_schemas.py) to house Pydantic models for various AI JSON response schemas, ensuring type safety and validation.
E2E Test Coverage: Added a new end-to-end test (frontend/e2e/structured-json-output.spec.ts) to verify the outline generation flow works correctly with the new structured JSON output implementation.

Changelog

backend/services/ai_providers/text/base.py
- Imported Optional, Type, and BaseModel.
- Added default generate_json and generate_json_with_image methods to the TextProvider abstract base class.
backend/services/ai_providers/text/genai_provider.py
- Imported Type, Optional, and BaseModel.
- Implemented _build_json_config to create structured JSON output configurations.
- Added generate_json and generate_json_with_image methods to support structured JSON generation using the GenAI provider's native capabilities.
backend/services/ai_schemas.py
- Added new file to define Pydantic models for AI JSON response schemas.
backend/services/ai_service.py
- Removed a TODO comment related to structured output.
- Imported Callable, Type, BaseModel, and specific Pydantic schemas.
- Refactored generate_json and generate_json_with_image methods to implement a structured-first approach with a fallback to text parsing and retry.
- Introduced _generate_json_fallback and _generate_json_with_image_fallback for handling non-structured output scenarios.
- Added static methods _extract_outline and _extract_descriptions to convert Pydantic models to the expected dictionary/list formats.
- Updated calls to generate_json in generate_outline, parse_outline_text, parse_description_to_outline, parse_description_to_page_descriptions, refine_outline, refine_descriptions, and extract_page_content to pass response_schema and extract_fn arguments.
backend/services/image_editability/text_attribute_extractors.py
- Updated calls to ai_service.generate_json_with_image in _call_vision_model and extract_batch_with_full_image to include response_schema and extract_fn for structured output.
frontend/e2e/structured-json-output.spec.ts
- Added a new E2E test file to verify the outline generation flow with structured JSON output by mocking API responses and asserting the expected behavior.

Activity

The pull request introduces a significant architectural change to how AI model responses are handled, moving from manual parsing to structured output.
The author provided a clear summary of the changes and their impact in the pull request description.
New Pydantic models were created to define the expected structure of AI-generated JSON, improving type safety and data validation.
Existing AI service methods were refactored to gracefully handle both structured output and fallback scenarios.
A new E2E test was added to ensure the core functionality (outline generation) remains robust with the new structured JSON output implementation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively refactors the JSON generation logic to use Google's structured JSON output API, with a robust fallback to the previous text-based generation and parsing method. The introduction of Pydantic models in ai_schemas.py for response validation is a great improvement for type safety and data integrity. The changes in ai_service.py are well-structured, separating the new structured output logic from the fallback mechanism.

I've found a bug in the fallback logic for JSON cleaning. Please see my detailed comments.

backend/services/ai_service.py

Anionex · 2026-02-19T14:40:53Z

/gemini review

gemini-code-assist

Code Review

This is a great pull request that significantly improves the reliability of JSON generation by adopting the structured output API from Google's GenAI SDK. The introduction of Pydantic models for response schemas is a clean way to enforce and validate the data structures. The fallback mechanism to the existing text-parsing-with-retry approach for providers that don't support structured output is a robust design choice. The refactoring is well-executed across the AIService and providers, and all callers have been updated accordingly. The addition of a new E2E test to verify the flow is also a valuable contribution. I have a few minor suggestions to improve code style and consistency.

backend/services/ai_providers/text/genai_provider.py

backend/services/ai_schemas.py

backend/services/ai_service.py

backend/services/image_editability/text_attribute_extractors.py

Anionex · 2026-02-19T14:47:29Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly improves the system by replacing manual JSON parsing with a structured output API and a fallback mechanism, utilizing Pydantic schemas in ai_schemas.py and implementing them in the GenAI provider, with refactoring in ai_service.py. This structured approach is a positive security enhancement as it enforces schemas on LLM outputs. However, two instances of Indirect Prompt Injection were identified where untrusted content from documents or PPT elements is directly embedded into LLM prompts without sufficient sanitization or escaping, which could allow an attacker to manipulate the LLM's behavior. Additionally, there is an issue with an unused parameter that makes an API misleading.

gemini-code-assist · 2026-02-19T14:55:17Z

backend/services/ai_service.py

+    def generate_json(self, prompt: str, thinking_budget: int = 1000,
+                      response_schema: Type[BaseModel] = None,
+                      extract_fn: Callable = None) -> Union[Dict, List]:


The thinking_budget parameter is declared but its value is ignored within the function. The actual_budget is unconditionally determined by self._get_text_thinking_budget(), which relies on the global application configuration. This is misleading for callers who might expect their provided thinking_budget to be used. This issue also applies to the generate_json_with_image method.

To make the code clearer and less error-prone, I recommend removing the thinking_budget parameter from the signatures of both generate_json and generate_json_with_image, and updating all their call sites accordingly. This will make it explicit that the thinking budget is controlled centrally via configuration.

Suggested change

def generate_json(self, prompt: str, thinking_budget: int = 1000,

response_schema: Type[BaseModel] = None,

extract_fn: Callable = None) -> Union[Dict, List]:

def generate_json(self, prompt: str,

response_schema: Type[BaseModel] = None,

extract_fn: Callable = None) -> Union[Dict, List]:

gemini-code-assist · 2026-02-19T14:55:17Z

backend/services/ai_service.py

        prompt = get_ppt_page_content_extraction_prompt(markdown_text, language=language)
-        result = self.generate_json(prompt, thinking_budget=1000)
+        result = self.generate_json(


The extract_page_content method takes markdown_text (extracted from a document) and embeds it directly into an LLM prompt. If the document contains malicious instructions (Indirect Prompt Injection), an attacker can manipulate the LLM's output, potentially controlling the extracted slide content (title, points, description). This could lead to the generation of malicious or misleading PPT content without the user's direct intent. Consider sanitizing the input or using more robust delimiters that are escaped in the input.

gemini-code-assist · 2026-02-19T14:55:17Z

backend/services/image_editability/text_attribute_extractors.py

            prompt = get_batch_text_attribute_extraction_prompt(text_elements_json)

-            # 调用 ai_service.generate_json_with_image（带重试机制）
+            # 调用 ai_service.generate_json_with_image（优先结构化输出，回退到重试）
            try:
                result = self.ai_service.generate_json_with_image(


The extract_batch_with_full_image method embeds text content from PPT elements into an LLM prompt using a JSON block. An attacker can provide a PPT with malicious text content designed to break out of the JSON structure (e.g., using ```) and inject arbitrary instructions. This is an Indirect Prompt Injection that allows manipulating the extracted style attributes of the PPT elements. Consider escaping the triple backticks in the input content before embedding it in the prompt.

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

backend/services/ai_service.py Outdated Show resolved Hide resolved

fix: use removeprefix/removesuffix instead of strip for code fences

86ff43c

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

style: address review comments - fix imports and type hints

7614996

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

Comments

Conversation

Anionex commented Feb 19, 2026

Summary

File Changes

E2E Test Coverage

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Anionex commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Anionex commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant