Skip to content

feat: support JSON types, expression language #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

r1b
Copy link

@r1b r1b commented Dec 8, 2024

This builds on the work in #1130

  • Export a JSON type
  • Implement the interface for the base JSON type
    • Support indexing to generate JSONPath expressions
    • Handle json_serializer / json_deserializer dialect args
  • Handle dialect-specific considerations
    • JSON is not a valid query parameter type, use STRING
    • Render PARSE_JSON around JSON-as-STRING bind expressions
    • Comparator support for BQ family of JSON -> SQL conversion functions, incl. LAX variants
    • Support JSONPath modes (preview feature)
  • Tests

TODO:

  • Add docs
  • Maybe support JSONIndexType?
    • I really struggled to get this working / not sure if its necessary - will comment with details
    • This would be necessary to support arbitrary expressions in indexing operations (as described in the docs)
  • Maybe remove the JSONPath mode support?
    • This was just for fun, but maybe we don't want to put this here until the feature is GA

References:


For context, I work on a system that exclusively uses the SQL expression language, so I may have some blind spots around support on the ORM side - let me know if anything is missing there.

Fixes #399 🦕

@r1b r1b requested review from a team as code owners December 8, 2024 16:54
@r1b r1b requested a review from suzmue December 8, 2024 16:54
@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Dec 8, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Dec 8, 2024
@chalmerlowe
Copy link
Collaborator

@r1b

I appreciate your interest in our project and am grateful for you effort to put together this PR.

I see that it attempts to build upon the work that was found in PR #1130.

I also see that we have then attempted to add a number of other elements above and beyond that.

We are currently at 350+ lines of new code. I have not had a chance to dive deeply into this code, so my apologies. I have several questions (some of these are more rhetorical so don't feel as though you need to respond to each of them ... this is mostly me thinking out loud).

  • Does ALL of this have to be in a single PR?
  • Do ALL of these edits create a single atomic unit of change?
  • Could we simplify the process of creating, reviewing, editing, and approving this work by limiting it to small discrete units of change per PR? (I am the sole full-time maintainer of this task and need to try and balance my commitments appropriately).

If we were to break this into small units of change:

  • what are the most important elements? (what are optional)
  • what order makes sense for implementation? (most important OR foundational work first, least important work last).
  • do we have appropriate tests for each unit of change?

What are your thoughts?

@r1b
Copy link
Author

r1b commented Dec 10, 2024

Yep, happy to break this up (more of a conversation-starter vs expecting to get all of this in :)) - I think the right sequencing is something like:

  • Add the JSON type (what we have in Add JSON type compiler support #1130, with added tests)
  • Add the json_serializer / json_deserializer dialect kwargs (expected for all JSON implementors)

This will unblock users who want to reference a JSON column in table definitions / query results w/o indexing

  • Add JSONPath indexing, comparator support

This will allow users to generate queries that reach into the JSON column to pull out subfields + generate appropriate derived types for those expressions

^ For my own use case, these two are satisfactory


  • (Optional) Support JSON query parameters

Optional because this is probably a niche use case / requires workarounds for missing dbapi support - instead, we could just disallow this.

This will allow users to pass a JSON object as a query parameter, with appropriate type conversions when generating SQL

  • (Optional) Support JSONPath modes

Optional because this is a preview feature - OK to wait until this is GA / requested by user

This will allow users to use lax / lax recursive modes in generated JSON path expressions

@r1b
Copy link
Author

r1b commented Dec 10, 2024

Closing in favor of #1147 - I pulled out a smaller PR from the rough sequencing above

@r1b r1b closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for JSON datatype
3 participants