Skip to content

[Parquet][C++] Add experimental VECTOR repetition level for Arrow FixedSizeList#51

Open
rok wants to merge 5 commits into
mainfrom
vector_repetition_level_2
Open

[Parquet][C++] Add experimental VECTOR repetition level for Arrow FixedSizeList#51
rok wants to merge 5 commits into
mainfrom
vector_repetition_level_2

Conversation

@rok
Copy link
Copy Markdown
Owner

@rok rok commented May 29, 2026

This PR prototypes a new experimental Parquet repetition type, VECTOR, mainly for Arrow's FixedSizeList<T, N> as proposed in Option B here.

@rok rok force-pushed the vector_repetition_level_2 branch from 31da9d3 to 1940c5d Compare May 29, 2026 17:57
Comment on lines +199 to +200
* an OPTIONAL parent node. Readers that do not understand VECTOR are expected to
* reject the file rather than attempting a LIST fallback.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers that do not understand VECTOR are expected to
reject the file rather than attempting a LIST fallback.

It's not clear to me what fallback to LIST means exactly. And wouldn't any kind of fallback behaviour indicate that the reader does understand VECTOR?

Maybe this could just be left as "Readers that do not understand VECTOR are expected to reject the file"

NodePtr element;
RETURN_NOT_OK(
FieldToNode(value_name, value_field, properties, arrow_properties, &element));
if (value_field->nullable()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on the previous PR, I think we should always use a 3-level schema structure for vector, regardless of the vector and element nullability. The schema nodes that are non-nullable won't contribute to rep/def levels and having a consistent schema structure keeps things consistent and simpler.

Comment on lines +220 to +221
// VECTOR is a structural repetition type. No LogicalType::VECTOR annotation is
// required: vector nullability is carried by the optional parent group, while the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the logical type annotation isn't strictly required, I think it still makes sense to have one so it's explicit that the top-level group node represents a vector field, rather than eg. a single-field struct with a vector typed child.

This also provides consistency with the list type. A list logical type annotation isn't strictly required either, you could just have plain schema nodes with repetition::repeated, but I think there's good reasons the list logical type was created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants