Skip to content

Remove zetasql #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

andrewfulton9
Copy link
Contributor

Removes zetaSQL dependency from data-validation.

@andrewfulton9 andrewfulton9 mentioned this pull request Jun 16, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request removes the zetasql dependency from the data-validation component by eliminating zetasql‐related files, targets, and configuration settings. Key changes include:

  • Removal of custom validation proto and C++ implementation files that indirectly functioned with zetasql.
  • Deletion of BUILD targets and WORKSPACE zetasql archive/imports.
  • Updating various build and configuration files (e.g. .bazelrc, setup.py, and CI workflows) to reflect the removal.

Reviewed Changes

Copilot reviewed 19 out of 137 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tensorflow_data_validation/anomalies/status_util.h Removal of extra trailing newline characters.
tensorflow_data_validation/anomalies/proto/custom_validation_config.proto Entire file removed, reflecting removal of custom validation.
tensorflow_data_validation/anomalies/proto/init.py Minor formatting changes with no functional impact.
tensorflow_data_validation/anomalies/BUILD Removal of custom_validation cc_library and cc_test targets.
tensorflow_data_validation/anomalies/custom_validation.h & .cc Entire files removed as part of the zetasql dependency removal.
tensorflow_data_validation/init.py Updated import grouping for consistency after removal.
setup.py and pyproject.toml Formatting and dependency updates; no functional issues noted.
WORKSPACE and .bazelrc Removal of zetasql imports with updated comments.
CI workflows (.github/workflows/test.yml and ci-lint.yml) Adjustments to installation and linting configurations.
.pre-commit-config.yaml Newly added to enforce code style and linting standards.
Comments suppressed due to low confidence (2)

.bazelrc:2

  • Consider removing these obsolete commented-out zetasql references entirely rather than leaving them in the file to reduce potential confusion in the future.
# Zetasql is removed.

tensorflow_data_validation/anomalies/BUILD:428

  • [nitpick] Ensure that all dependencies and references to the removed custom_validation targets are updated accordingly in the rest of the codebase to avoid any unresolved references.
cc_library(

@peytondmurray
Copy link

Built on top of #267, which will be merged before this. Final tests running now 🤞

Copy link

@peytondmurray peytondmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "custom validation", they're discussing validation done at the SQL level, right? As far as I can tell in the docs there's no mention of "custom validation" or anything related to SQL.

For anyone reading this doing future feature development: it would be great to have more code documentation...

@@ -71,18 +71,6 @@ http_archive(
],
)

# Needed by abseil-py by zetasql.
http_archive(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Can't believe we're still dealing with 2->3 16 years later...

@@ -112,6 +100,16 @@ http_archive(
url = "https://github.com/abseil/abseil-cpp/archive/%s.tar.gz" % COM_GOOGLE_ABSL_COMMIT,
)


# re2 required for google tests
http_archive(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, were tests even running before? Not sure how this wasn't part of the build earlier...

),
per_feature_parent_indices,
)
flattened, value_parent_indices = array_util.flatten_nested(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewfulton9 Is this intended here, or did I mess this up when I merged master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants