Skip to content

Commit 27fdd35

Browse files
authored
MINOR: [Docs] Clarify struct validity masking with 'hidden data' example (#49554)
### Rationale for this change The current documentation contains a technical inconsistency in the validity bitmap values for column 4. According to the Arrow specification: > In Arrow, a dedicated buffer, known as the validity (or “null”) bitmap, is used alongside the data indicating whether each value in the array is null or not: a value of 1 means that the value is not-null (“valid”), whereas a value of 0 indicates that the value is null. In the existing example, the validity bitmap for the third row of the variable-size binary child array is incorrectly set to 1, despite the row being null. This PR corrects that bit to 0 to align with the fixed-size primitive child array and the overall Arrow memory layout standards. ### What changes are included in this PR? Updated the struct-diagram.svg ### Are these changes tested? No ### Are there any user-facing changes? No Authored-by: Philipp Ucke <philippucke@googlemail.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
1 parent d08d5e6 commit 27fdd35

2 files changed

Lines changed: 10 additions & 3 deletions

File tree

docs/source/format/Intro.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,13 @@ key is the field name and the child array its values. The field (key) is
296296
saved in the schema and the values of a specific field (key) are saved in
297297
the child array.
298298

299+
Since child arrays are independent, Arrow does not enforce physical
300+
consistency between the struct's validity bitmap and those of it's children.
301+
Logically, a struct row is only valid if both the parent and the child
302+
bitmaps have a value of 1 for that slot (a logical AND operation).
303+
This allows for "hidden" data to exist in child arrays at null struct
304+
positions (see ``alice`` below).
305+
299306
.. figure:: ./images/struct-diagram.svg
300307
:alt: Diagram is showing the difference between the struct data type
301308
presented in a Table and the data actually stored in computer

0 commit comments

Comments
 (0)