-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47838: [C++][Parquet] Set Variant specification version to 1 to align with the variant spec #47835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
See also: |
@wgtmac Can you take a look? |
|
a7c4d0d
to
3450f1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not the case for our own reader? |
You mean Parquet C++ reader? Currently Parquet C++ reader hasn't implemented Variant read/write yet. The only thing it added is to support Variant logical type and we are not doing the check. Currently Variant spec version is version 1 (and this is the only version) and the other readers may/may not add the check. |
4b65ada
to
38a919d
Compare
Sorry, because I might be missing something obvious, I am not too familiar with this part of the codebase, but if we haven't implemented Parquet C++ Variant write yet, I am not sure I understand how can a user would be able to create Variant files with a logical type annotation that will be incorrect with Parquet C++ if we release without this fix. |
The engines will implement the reader/writer parts but will use the variant type defined in Arrow Parquet. That would cause the engines to write incorrect annotation. That's what I'm seeing internally. |
Rationale for this change
According to the Variant specification, the specification_version field must be set to 1 to indicate Variant encoding version 1. Currently, this field defaults to 0, which violates the specification. Parquet readers that strictly enforce specification version validation will fail to read files containing Variant types.

What changes are included in this PR?
The change includes defaulting the specification version to 1.
Are these changes tested?
The change is covered by unit test.
Are there any user-facing changes?
The Parquet files produced the variant logical type annotation
VARIANT(1)
.