[rust] Add end-to-end ROW type#544
Conversation
- Add `Datum::Row(Box<GenericRow>)` variant with `as_row()` accessor - Add `get_row()` to `InternalRow` trait with default error impl - Implement `GenericRow::get_row()` and `CompactedRow::get_row()` delegation - Implement `ColumnarRow::get_row()` with Arrow StructArray extraction + OnceLock caching - Add `InnerValueWriter::Row(RowType)` and write path via nested CompactedRowWriter - Add `DataType::Row` arm in `CompactedRowDeserializer` for eager nested decode - Add `InnerFieldGetter::Row` and hook up FieldGetter/ValueWriter pipeline - Handle `Datum::Row` in `resolve_row_types` (C++ bindings) - Add round-trip tests: simple nesting, deep nesting, nullable fields, ROW as primary key Wire format matches Java: varint-length-prefixed blob of a complete CompactedRow.
d750eb2 to
41289fc
Compare
54c8908 to
4ad3a28
Compare
4ad3a28 to
fb693b6
Compare
|
Nice! thanks for reviving the PR, will take a look tomorrow! 🙏 |
charlesdong1991
left a comment
There was a problem hiding this comment.
Thanks for the great work, overall looks great!! 👍 Honestly i paid a bit more attention on the addition in this revival 😅 and only have a couple minor comments, given the size of PR, we can probably better to have follow-up PRs instead.
In follow-up PRs we can also add corresponding docs to reflect the new change!
| let column = self.record_batch.column(pos); | ||
| // Children of a null parent may carry stale bytes; caller must | ||
| // check is_null_at first rather than rely on what we'd read. | ||
| if column.is_null(self.row_id) { |
There was a problem hiding this comment.
i wonder should user check is_null_at before get_row?
| self.row_id = row_id | ||
| self.row_id = row_id; | ||
| for lock in self.row_caches.iter_mut() { | ||
| *lock = std::sync::OnceLock::new(); |
There was a problem hiding this comment.
you mentioned in PR description that Per-row Vec<OnceLock> allocation killed, looking at this, i wonder if it is still the case that it will allocate OnceLock on every row iteration?
| // TODO: add Map and Row field getter support once their binary forms are implemented. | ||
| // TODO: add Map field getter support once its binary form is implemented. | ||
| InnerFieldGetter::Array { pos } => Datum::Array(row.get_array(*pos)?), | ||
| InnerFieldGetter::Row { pos } => Datum::Row(Box::new(row.get_row(*pos)?.clone())), |
There was a problem hiding this comment.
i wonder how performant this will be since it seems it clones the whole generic row? maybe we can flag awareness on hot scan paths here and revisit if needed later because the PR is huge already
|
Ty for the review @charlesdong1991 |
Continues work from #442 (which went stale).
Original implementation by @hemanthsavasere and this PR rebases on current main, addresses review feedback and adds the follow-ups listed below.
Closes #388
Closes #442
Original work
Datum::Row,InternalRow::get_row, CompactedRow ROW deserializerNestedRowWriter, ROW arm in compacted key encoderAdded in this revival
field_idmachinery (server requires unique ids across nested ROW —matches Java's
ReassignFieldId)ARRAY<ROW>supportVec<OnceLock>allocation killedleft followup #543 to address performance with nested structures, as this is PR is big enough and the gap comes from pre-existing code