[Parquet][C++] Add experimental VECTOR repetition level for Arrow FixedSizeList#51
[Parquet][C++] Add experimental VECTOR repetition level for Arrow FixedSizeList#51rok wants to merge 5 commits into
Conversation
29617f2 to
911b0d8
Compare
31da9d3 to
1940c5d
Compare
| * an OPTIONAL parent node. Readers that do not understand VECTOR are expected to | ||
| * reject the file rather than attempting a LIST fallback. |
There was a problem hiding this comment.
Readers that do not understand VECTOR are expected to
reject the file rather than attempting a LIST fallback.
It's not clear to me what fallback to LIST means exactly. And wouldn't any kind of fallback behaviour indicate that the reader does understand VECTOR?
Maybe this could just be left as "Readers that do not understand VECTOR are expected to reject the file"
| NodePtr element; | ||
| RETURN_NOT_OK( | ||
| FieldToNode(value_name, value_field, properties, arrow_properties, &element)); | ||
| if (value_field->nullable()) { |
There was a problem hiding this comment.
As mentioned on the previous PR, I think we should always use a 3-level schema structure for vector, regardless of the vector and element nullability. The schema nodes that are non-nullable won't contribute to rep/def levels and having a consistent schema structure keeps things consistent and simpler.
| // VECTOR is a structural repetition type. No LogicalType::VECTOR annotation is | ||
| // required: vector nullability is carried by the optional parent group, while the |
There was a problem hiding this comment.
While the logical type annotation isn't strictly required, I think it still makes sense to have one so it's explicit that the top-level group node represents a vector field, rather than eg. a single-field struct with a vector typed child.
This also provides consistency with the list type. A list logical type annotation isn't strictly required either, you could just have plain schema nodes with repetition::repeated, but I think there's good reasons the list logical type was created.
This PR prototypes a new experimental Parquet repetition type, VECTOR, mainly for Arrow's
FixedSizeList<T, N>as proposed in Option B here.