-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: Refactor BQ Analytics Plugin to use Structured JSON #3729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Refactor BQ Analytics Plugin to use Structured JSON #3729
Conversation
Summary of ChangesHello @amenegola, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the BigQuery Analytics Plugin to improve the storage and analysis of agent event data. By transitioning the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change will greatly enhance the queryability and analysis of the logged data. The implementation of _recursive_smart_truncate for handling long strings within JSON is a thoughtful addition that prevents data corruption. The corresponding updates to the callback methods and unit tests are thorough and well-executed. I have one high-severity suggestion to address a potential regression in handling different types for system_instruction.
495d377 to
a006a0f
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring that significantly improves the analytics plugin by moving from unstructured string logs to structured JSON. This change will make downstream analysis much easier and more powerful. The implementation of smart truncation to avoid breaking JSON structures is a great addition, and the updates to the content formatter API are well-designed. The accompanying test updates are thorough and improve test stability by replacing asyncio.sleep with await plugin.close(). I have one suggestion to improve the consistency of how missing data is represented in the JSON payload.
a006a0f to
ef0c1e2
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event content instead of pipe-delimited strings. This change greatly enhances the queryability and integrity of the logged data. The implementation of _recursive_smart_truncate for handling long strings within JSON payloads is well-executed, and the updates to the callback logic to produce structured dictionaries are clean and effective. The test suite has also been substantially improved, notably by replacing asyncio.sleep with await plugin.close() for better reliability. I've noted a few minor inconsistencies in how empty fields are represented in the JSON payloads across different callbacks. Standardizing these will further improve the consistency of the logged data. Overall, this is a high-quality and valuable contribution.
ef0c1e2 to
422b544
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant improvement, refactoring the BigQuery analytics plugin to use structured JSON for event content instead of pipe-delimited strings. This change will greatly benefit downstream analysis. The implementation, including the new smart truncation logic and the update to the content_formatter to handle dictionary payloads, is well-executed. The unit tests have also been comprehensively updated to match the new functionality. I've identified a few areas for improvement: a regression that causes the loss of event timestamps, an incorrect type hint, and an untested error handling path. Addressing these points will further strengthen this excellent contribution.
422b544 to
d42fac8
Compare
d42fac8 to
a3ace99
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring that significantly improves the BigQuery Analytics Plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major step forward for queryability and data analysis. The implementation of _recursive_smart_truncate is a clever solution for handling large data while maintaining valid JSON. The changes to the content_formatter to allow dictionary-based payload mutation is also a great feature for PII redaction and data pruning. The test suite has been thoroughly updated to reflect these changes, which is fantastic. I have one suggestion to make the schema error detection more robust.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant improvement, refactoring the BigQuery Analytics Plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change greatly enhances the queryability and robustness of the logged data. The introduction of _recursive_smart_truncate for handling long string values without breaking JSON structure is a thoughtful addition. The updated callback methods to build structured payload dictionaries are much cleaner and more maintainable. The accompanying test suite has also been substantially improved, replacing fragile asyncio.sleep calls with await plugin.close() for more reliable testing. I have one suggestion to improve the error handling for cancelled tasks during shutdown.
750b830 to
aa99faa
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is an excellent refactoring of the BigQueryAgentAnalyticsPlugin. The move from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will greatly enhance downstream analysis and querying capabilities. The implementation of _recursive_smart_truncate for safely truncating long string values within the JSON payload is a clever solution to prevent data loss and formatting issues. The updated content_formatter logic, which now operates on dictionaries, provides more flexibility for users to programmatically modify payloads before serialization. The accompanying test suite has also been substantially improved, with more specific tests for the new structured data and better handling of asynchronous operations using await plugin.close() instead of asyncio.sleep().
I have a couple of suggestions to further improve the robustness of the new implementation. One is to handle tuples in the smart truncation logic, and the other is to add a test case for the error handling of the custom content formatter. Overall, this is a high-quality contribution.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
c16d46b to
b79ac28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring of the BigQuery analytics plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will greatly enhance data analysis and querying capabilities. The implementation is robust, including the clever _recursive_smart_truncate function to prevent data loss while maintaining valid JSON, and safe serialization. The breaking schema change is clearly communicated in the PR description.
The accompanying test updates are also high-quality. Replacing asyncio.sleep with await plugin.close() makes the tests more reliable, and the assertions have been thoroughly updated to validate the new structured data format.
I've added a couple of suggestions to improve the maintainability of the tests by promoting better test isolation, but overall, this is a very strong contribution.
tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py
Outdated
Show resolved
Hide resolved
tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py
Outdated
Show resolved
Hide resolved
b79ac28 to
ec6d377
Compare
ec6d377 to
892cb16
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring that significantly improves the quality and usability of the analytics data logged by the BigQueryAgentAnalyticsPlugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major enhancement that will make downstream analysis much easier and more powerful. The implementation of _recursive_smart_truncate is a clever solution to handle long string values without corrupting the JSON structure. The changes to the content_formatter to allow for pre-serialization payload mutation are also a great addition for data privacy and cleaning. The accompanying unit tests have been substantially improved, becoming more robust and comprehensive.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring of the BigQuery Analytics Plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a significant improvement that will make the logged data much more useful and easier to query. The implementation is robust, featuring a smart, recursive truncation method to handle large payloads without breaking JSON validity, and improved error handling. The accompanying unit tests have also been thoroughly updated and refactored, increasing their reliability and coverage of the new structured logging format. I have one suggestion to improve the completeness of the logged data.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring that significantly improves the analytics plugin. Moving from unstructured, pipe-delimited strings to structured JSON for event payloads is a major step forward for downstream analysis and queryability. The implementation of _recursive_smart_truncate is a clever solution for handling large payloads without breaking the JSON structure. The change to the content_formatter to operate on dictionaries is also a great enhancement, allowing for more powerful pre-serialization logic like PII redaction. The accompanying test suite has been thoroughly updated to reflect these changes, with notable improvements in test reliability by replacing asyncio.sleep with explicit plugin.close() calls. I have a couple of minor suggestions to further improve the clarity of the tests. Overall, this is a high-quality contribution.
tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py
Outdated
Show resolved
Hide resolved
tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py
Outdated
Show resolved
Hide resolved
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is an excellent refactoring of the BigQuery Analytics Plugin to use structured JSON for event payloads instead of pipe-delimited strings. This change significantly improves the utility of the logged data for downstream analysis. The implementation is robust, including smart truncation to preserve JSON validity and safe serialization. The accompanying test updates are thorough and greatly improve test reliability by replacing asyncio.sleep with more deterministic waiting. I've identified a few areas for improvement to make the logging even more comprehensive and the code safer, detailed in the comments below.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…m/amenegola/adk-python into feature/bigquery-analytics-plugin
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent refactoring of the BigQuery Analytics Plugin. Switching from unstructured, pipe-delimited strings to structured JSON for event content is a significant improvement that will make the logged data much more queryable and useful for analysis. The implementation of _recursive_smart_truncate to handle long string values without breaking JSON structures is well-executed, and allowing programmatic modification of the payload via content_formatter is a great addition for data privacy and pruning. The unit tests have also been thoroughly updated and improved, adopting more robust async testing patterns. Overall, this is a high-quality change that greatly enhances the plugin's capabilities.
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
Problem:
The current
BigQueryAgentAnalyticsPluginimplementation stores complex event payloads (such as LLM requests, responses, and tool calls) as unstructured, concatenated strings using pipe delimiters. This makes downstream analysis difficult (requiring complex Regex), prevents efficient querying of nested fields (like token usage), and relies on a hard character limit for truncation that can result in data loss or broken formatting.Solution:
This PR refactors the plugin to leverage BigQuery's native
JSONdata type for thecontentcolumn.contentasJSON._recursive_smart_truncateto safely truncate long string values within the payload without breaking the JSON structure.content_formatterlogic to accept and return adict. This allows users to programmatically redact PII or prune fields before serialization.Testing Plan
Please describe the tests that you ran to verify your changes. This is required
for all PRs that are not small documentation or typo fixes.
Unit Tests:
Please include a summary of passed
pytestresults.Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix.
contentcolumn isJSON. Query the table usingJSON_VALUE(content, '$.tool_name')to verify structured access works.Checklist
Additional context
This update changes the BigQuery table schema. Users with existing tables created by previous versions of this plugin will need to delete the table (to allow the plugin to recreate it) or manually migrate the
contentcolumn fromSTRINGtoJSON.