feat: use structured JSON output API with fallback#250
Conversation
Replace manual text+parse+retry JSON generation with Google's native structured JSON output API (Pydantic model + response_schema). Models that don't support structured output automatically fall back to the existing retry-based approach.
Summary of ChangesHello @Anionex, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability and efficiency of AI model interactions by transitioning to a structured JSON output approach. By prioritizing native API support for schema-driven JSON generation and providing a robust fallback for unsupported models, the system ensures consistent and valid data processing. This change streamlines the integration of AI responses, reducing the need for error-prone text parsing and improving overall system stability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the JSON generation logic to use Google's structured JSON output API, with a robust fallback to the previous text-based generation and parsing method. The introduction of Pydantic models in ai_schemas.py for response validation is a great improvement for type safety and data integrity. The changes in ai_service.py are well-structured, separating the new structured output logic from the fallback mechanism.
I've found a bug in the fallback logic for JSON cleaning. Please see my detailed comments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This is a great pull request that significantly improves the reliability of JSON generation by adopting the structured output API from Google's GenAI SDK. The introduction of Pydantic models for response schemas is a clean way to enforce and validate the data structures. The fallback mechanism to the existing text-parsing-with-retry approach for providers that don't support structured output is a robust design choice. The refactoring is well-executed across the AIService and providers, and all callers have been updated accordingly. The addition of a new E2E test to verify the flow is also a valuable contribution. I have a few minor suggestions to improve code style and consistency.
backend/services/image_editability/text_attribute_extractors.py
Outdated
Show resolved
Hide resolved
backend/services/image_editability/text_attribute_extractors.py
Outdated
Show resolved
Hide resolved
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly improves the system by replacing manual JSON parsing with a structured output API and a fallback mechanism, utilizing Pydantic schemas in ai_schemas.py and implementing them in the GenAI provider, with refactoring in ai_service.py. This structured approach is a positive security enhancement as it enforces schemas on LLM outputs. However, two instances of Indirect Prompt Injection were identified where untrusted content from documents or PPT elements is directly embedded into LLM prompts without sufficient sanitization or escaping, which could allow an attacker to manipulate the LLM's behavior. Additionally, there is an issue with an unused parameter that makes an API misleading.
| def generate_json(self, prompt: str, thinking_budget: int = 1000, | ||
| response_schema: Type[BaseModel] = None, | ||
| extract_fn: Callable = None) -> Union[Dict, List]: |
There was a problem hiding this comment.
The thinking_budget parameter is declared but its value is ignored within the function. The actual_budget is unconditionally determined by self._get_text_thinking_budget(), which relies on the global application configuration. This is misleading for callers who might expect their provided thinking_budget to be used. This issue also applies to the generate_json_with_image method.
To make the code clearer and less error-prone, I recommend removing the thinking_budget parameter from the signatures of both generate_json and generate_json_with_image, and updating all their call sites accordingly. This will make it explicit that the thinking budget is controlled centrally via configuration.
| def generate_json(self, prompt: str, thinking_budget: int = 1000, | |
| response_schema: Type[BaseModel] = None, | |
| extract_fn: Callable = None) -> Union[Dict, List]: | |
| def generate_json(self, prompt: str, | |
| response_schema: Type[BaseModel] = None, | |
| extract_fn: Callable = None) -> Union[Dict, List]: |
| prompt = get_ppt_page_content_extraction_prompt(markdown_text, language=language) | ||
| result = self.generate_json(prompt, thinking_budget=1000) | ||
| result = self.generate_json( |
There was a problem hiding this comment.
The extract_page_content method takes markdown_text (extracted from a document) and embeds it directly into an LLM prompt. If the document contains malicious instructions (Indirect Prompt Injection), an attacker can manipulate the LLM's output, potentially controlling the extracted slide content (title, points, description). This could lead to the generation of malicious or misleading PPT content without the user's direct intent. Consider sanitizing the input or using more robust delimiters that are escaped in the input.
| prompt = get_batch_text_attribute_extraction_prompt(text_elements_json) | ||
|
|
||
| # 调用 ai_service.generate_json_with_image(带重试机制) | ||
| # 调用 ai_service.generate_json_with_image(优先结构化输出,回退到重试) | ||
| try: | ||
| result = self.ai_service.generate_json_with_image( |
There was a problem hiding this comment.
The extract_batch_with_full_image method embeds text content from PPT elements into an LLM prompt using a JSON block. An attacker can provide a PPT with malicious text content designed to break out of the JSON structure (e.g., using ```) and inject arbitrary instructions. This is an Indirect Prompt Injection that allows manipulating the extracted style attributes of the PPT elements. Consider escaping the triple backticks in the input content before embedding it in the prompt.
Summary
response_schema)generate_jsoncallers and 2generate_json_with_imagecallers updatedFile Changes
backend/services/ai_schemas.py(new) — Pydantic models for all JSON response schemasbackend/services/ai_providers/text/base.py— Addedgenerate_json/generate_json_with_imagedefault methods to TextProviderbackend/services/ai_providers/text/genai_provider.py— Structured output implementation for GenAI providerbackend/services/ai_service.py— Refactoredgenerate_json/generate_json_with_imagewith structured-first + fallback patternbackend/services/image_editability/text_attribute_extractors.py— Pass schemas togenerate_json_with_imagecallsE2E Test Coverage
frontend/e2e/structured-json-output.spec.ts— Mock test verifying outline generation flow works with structured JSON output