Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row count checks on final parquet, _metadata and ancillary files #373

Closed
nevencaplar opened this issue Aug 9, 2024 · 0 comments · Fixed by #428
Closed

Row count checks on final parquet, _metadata and ancillary files #373

nevencaplar opened this issue Aug 9, 2024 · 0 comments · Fixed by #428
Assignees
Labels
enhancement New feature or request

Comments

@nevencaplar
Copy link
Member

This is part of the verification pipeline tickets, connected with #344

Implement row count checks on

  1. Final Parquet Files
    ● Get row counts from file footers.
    ● Compare total with truth.
    ● Compare per partition with intermediate files.
  2. _metadata File
    ● Get row counts from _metadata file.
    ● Compare total with truth.
    ● Compare per partition with intermediate files.
  3. Ancillary Files
    ● Check numbers in all ancillary files.
    ● Total in README.
    ● Counts per file/partition in csv files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
2 participants