Skip to content

Conversation

@amenegola
Copy link
Contributor

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

Problem:
The current BigQueryAgentAnalyticsPlugin implementation stores complex event payloads (such as LLM requests, responses, and tool calls) as unstructured, concatenated strings using pipe delimiters. This makes downstream analysis difficult (requiring complex Regex), prevents efficient querying of nested fields (like token usage), and relies on a hard character limit for truncation that can result in data loss or broken formatting.

Solution:
This PR refactors the plugin to leverage BigQuery's native JSON data type for the content column.

  • Structured Storage: Schema updated to store content as JSON.
  • Smart Truncation: Implemented _recursive_smart_truncate to safely truncate long string values within the payload without breaking the JSON structure.
  • Payload Mutation: Updated the content_formatter logic to accept and return a dict. This allows users to programmatically redact PII or prune fields before serialization.

Testing Plan

Please describe the tests that you ran to verify your changes. This is required
for all PRs that are not small documentation or typo fixes.

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Please include a summary of passed pytest results.

tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py ...........................
======================== 27 passed, 1 warning in 4.25s =========================

Manual End-to-End (E2E) Tests:

Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix.

  1. Setup: Configure the plugin with a Google Cloud Project and Dataset.
  2. Run: Execute an agent that uses tools and generates long text responses.
  3. Verify: Check the BigQuery table schema to confirm the content column is JSON. Query the table using JSON_VALUE(content, '$.tool_name') to verify structured access works.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

⚠️ Breaking Schema Change:
This update changes the BigQuery table schema. Users with existing tables created by previous versions of this plugin will need to delete the table (to allow the plugin to recreate it) or manually migrate the content column from STRING to JSON.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @amenegola, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the BigQuery Analytics Plugin to improve the storage and analysis of agent event data. By transitioning the content column to a native BigQuery JSON type, it moves away from unstructured, pipe-delimited strings, enabling more efficient and flexible querying of nested data. The changes also introduce intelligent truncation logic to maintain data integrity and enhance the custom content formatting capabilities, providing users with better control over their logged data.

Highlights

  • Structured Content Storage: The content column in the BigQuery table schema has been changed from STRING to JSON, allowing for structured storage and easier querying of complex event payloads like LLM requests, responses, and tool calls.
  • Smart Truncation: A new recursive truncation mechanism (_recursive_smart_truncate) is implemented to safely truncate long string values within nested JSON objects without breaking the JSON structure. This prevents data loss and malformed entries, and the max_content_length default is increased to 50KB.
  • Enhanced Content Formatting: The content_formatter now accepts and returns a dictionary, enabling programmatic PII redaction or field pruning before JSON serialization, offering greater control and flexibility over logged data. The is_truncated column has been removed as smart truncation ensures valid JSON.
  • Refactored Callbacks: All event callback methods have been updated to construct and pass structured dictionaries as content payloads, streamlining the logging process and leveraging the new JSON storage capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label Nov 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change will greatly enhance the queryability and analysis of the logged data. The implementation of _recursive_smart_truncate for handling long strings within JSON is a thoughtful addition that prevents data corruption. The corresponding updates to the callback methods and unit tests are thorough and well-executed. I have one high-severity suggestion to address a potential regression in handling different types for system_instruction.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from 495d377 to a006a0f Compare November 26, 2025 20:59
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring that significantly improves the analytics plugin by moving from unstructured string logs to structured JSON. This change will make downstream analysis much easier and more powerful. The implementation of smart truncation to avoid breaking JSON structures is a great addition, and the updates to the content formatter API are well-designed. The accompanying test updates are thorough and improve test stability by replacing asyncio.sleep with await plugin.close(). I have one suggestion to improve the consistency of how missing data is represented in the JSON payload.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from a006a0f to ef0c1e2 Compare November 26, 2025 21:23
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event content instead of pipe-delimited strings. This change greatly enhances the queryability and integrity of the logged data. The implementation of _recursive_smart_truncate for handling long strings within JSON payloads is well-executed, and the updates to the callback logic to produce structured dictionaries are clean and effective. The test suite has also been substantially improved, notably by replacing asyncio.sleep with await plugin.close() for better reliability. I've noted a few minor inconsistencies in how empty fields are represented in the JSON payloads across different callbacks. Standardizing these will further improve the consistency of the logged data. Overall, this is a high-quality and valuable contribution.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from ef0c1e2 to 422b544 Compare November 26, 2025 21:27
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event content instead of pipe-delimited strings. This change will greatly benefit downstream analysis. The implementation, including the new smart truncation logic and the update to the content_formatter to handle dictionary payloads, is well-executed. The unit tests have also been comprehensively updated to match the new functionality. I've identified a few areas for improvement: a regression that causes the loss of event timestamps, an incorrect type hint, and an untested error handling path. Addressing these points will further strengthen this excellent contribution.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from 422b544 to d42fac8 Compare November 26, 2025 21:31
@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from d42fac8 to a3ace99 Compare November 26, 2025 21:32
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring that significantly improves the BigQuery Analytics Plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major step forward for queryability and data analysis. The implementation of _recursive_smart_truncate is a clever solution for handling large data while maintaining valid JSON. The changes to the content_formatter to allow dictionary-based payload mutation is also a great feature for PII redaction and data pruning. The test suite has been thoroughly updated to reflect these changes, which is fantastic. I have one suggestion to make the schema error detection more robust.

@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant improvement, refactoring the BigQuery Analytics Plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change greatly enhances the queryability and robustness of the logged data. The introduction of _recursive_smart_truncate for handling long string values without breaking JSON structure is a thoughtful addition. The updated callback methods to build structured payload dictionaries are much cleaner and more maintainable. The accompanying test suite has also been substantially improved, replacing fragile asyncio.sleep calls with await plugin.close() for more reliable testing. I have one suggestion to improve the error handling for cancelled tasks during shutdown.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from 750b830 to aa99faa Compare November 26, 2025 21:55
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent refactoring of the BigQueryAgentAnalyticsPlugin. The move from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will greatly enhance downstream analysis and querying capabilities. The implementation of _recursive_smart_truncate for safely truncating long string values within the JSON payload is a clever solution to prevent data loss and formatting issues. The updated content_formatter logic, which now operates on dictionaries, provides more flexibility for users to programmatically modify payloads before serialization. The accompanying test suite has also been substantially improved, with more specific tests for the new structured data and better handling of asynchronous operations using await plugin.close() instead of asyncio.sleep().

I have a couple of suggestions to further improve the robustness of the new implementation. One is to handle tuples in the smart truncation logic, and the other is to add a test case for the error handling of the custom content formatter. Overall, this is a high-quality contribution.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@amenegola
Copy link
Contributor Author

/gemini review

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from c16d46b to b79ac28 Compare November 26, 2025 22:05
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring of the BigQuery analytics plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will greatly enhance data analysis and querying capabilities. The implementation is robust, including the clever _recursive_smart_truncate function to prevent data loss while maintaining valid JSON, and safe serialization. The breaking schema change is clearly communicated in the PR description.

The accompanying test updates are also high-quality. Replacing asyncio.sleep with await plugin.close() makes the tests more reliable, and the assertions have been thoroughly updated to validate the new structured data format.

I've added a couple of suggestions to improve the maintainability of the tests by promoting better test isolation, but overall, this is a very strong contribution.

@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from b79ac28 to ec6d377 Compare November 26, 2025 22:14
@amenegola amenegola force-pushed the feature/bigquery-analytics-plugin branch from ec6d377 to 892cb16 Compare November 26, 2025 22:17
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring that significantly improves the quality and usability of the analytics data logged by the BigQueryAgentAnalyticsPlugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major enhancement that will make downstream analysis much easier and more powerful. The implementation of _recursive_smart_truncate is a clever solution to handle long string values without corrupting the JSON structure. The changes to the content_formatter to allow for pre-serialization payload mutation are also a great addition for data privacy and cleaning. The accompanying unit tests have been substantially improved, becoming more robust and comprehensive.

@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring of the BigQuery Analytics Plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will make the logged data much more useful and easier to query. The implementation is robust, featuring a smart, recursive truncation method to handle large payloads without breaking JSON validity, and improved error handling. The accompanying unit tests have also been thoroughly updated and refactored, increasing their reliability and coverage of the new structured logging format. I have one suggestion to improve the completeness of the logged data.

@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring that significantly improves the analytics plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major step forward for downstream analysis and queryability. The implementation of _recursive_smart_truncate is a clever solution for handling large payloads without breaking the JSON structure. The change to the content_formatter to operate on dictionaries is also a great enhancement, allowing for more powerful pre-serialization logic like PII redaction. The accompanying test suite has been thoroughly updated to reflect these changes, with notable improvements in test reliability by replacing asyncio.sleep with explicit plugin.close() calls. I have a couple of minor suggestions to further improve the clarity of the tests. Overall, this is a high-quality contribution.

@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent refactoring of the BigQuery Analytics Plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change significantly improves the utility of the logged data for downstream analysis. The implementation is robust, including smart truncation to preserve JSON validity and safe serialization. The accompanying test updates are thorough and greatly improve test reliability by replacing asyncio.sleep with more deterministic waiting. I've identified a few areas for improvement to make the logging even more comprehensive and the code safer, detailed in the comments below.

amenegola and others added 3 commits November 26, 2025 19:47
@amenegola
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent refactoring of the BigQuery Analytics Plugin. Switching from unstructured, pipe-delimited strings to structured JSON for event content is a significant improvement that will make the logged data much more queryable and useful for analysis. The implementation of _recursive_smart_truncate to handle long string values without breaking JSON structures is well-executed, and allowing programmatic modification of the payload via content_formatter is a great addition for data privacy and pruning. The unit tests have also been thoroughly updated and improved, adopting more robust async testing patterns. Overall, this is a high-quality change that greatly enhances the plugin's capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor BigQuery Analytics Plugin to use Structured JSON

2 participants