Commit eae14b1
committed
feat: eliminate GenericDatum in Avro reader for performance
Replace GenericDatum intermediate layer with direct Avro decoder access
to improve manifest I/O performance.
Changes:
- Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API
- Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding
- Primitive types: bool, int, long, float, double, string, binary, fixed
- Temporal types: date, time, timestamp
- Logical types: uuid, decimal (with validation)
- Nested types: struct, list, map
- Union type handling with bounds checking
- Field skipping with proper multi-block handling for arrays/maps
- Modify avro_reader.cc to use DataFileReaderBase with direct decoder
- Replace DataFileReader<GenericDatum> with DataFileReaderBase
- Use decoder.decodeInt(), decodeLong(), etc. directly
- Remove GenericDatum allocation and extraction overhead
- Update CMakeLists.txt to include new decoder source
Validation added:
- Union branch bounds checking
- Decimal byte width validation (uses schema fixedSize, not calculated)
- Decimal precision sufficiency validation
- Logical type presence validation
- Type mismatch error handling
Documentation:
- Comprehensive API documentation in header
- Schema evolution handling via SchemaProjection explained
- Error handling behavior documented
- Limitations noted (default values not supported)
Performance improvement:
- Before: Avro binary → GenericDatum → Extract → Arrow (3 steps)
- After: Avro binary → decoder.decodeInt() → Arrow (2 steps)
This matches Java implementation which uses Decoder directly via
ValueReader interface, avoiding intermediate object allocation.
All 173 avro_test cases pass.
Issue: #3321 parent 9805fae commit eae14b1
File tree
4 files changed
+687
-10
lines changed- src/iceberg
- avro
4 files changed
+687
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
| 141 | + | |
141 | 142 | | |
142 | 143 | | |
143 | 144 | | |
| |||
0 commit comments