[VL] Add lazy per-column deserialization for Columnar Table Cache#12211
[VL] Add lazy per-column deserialization for Columnar Table Cache#12211jackylee-ch wants to merge 1 commit into
Conversation
|
Run Gluten Clickhouse CI on x86 |
58bd451 to
d5a0502
Compare
|
Run Gluten Clickhouse CI on x86 |
d5a0502 to
8e374db
Compare
|
Run Gluten Clickhouse CI on x86 |
8e374db to
0f0ccd2
Compare
|
Run Gluten Clickhouse CI on x86 |
0f0ccd2 to
8b09d6b
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@yaooqinn PTAL |
|
Thanks @jackylee-ch, V3 layout is a sensible extension of the cache-stats wire we landed in #12092 / #12196. Several things to discuss before this lands: 1. Benchmark needs to be re-run. The checked-in 2. Do we really need a new SQLConf? V3 functionally supersedes V2 (V3 frames also carry 3. Cross-language test parity vs #12196. V3 has no cpp-side byte-equal golden test; JVM-side tests synthesize their own frames via 4. Smaller items.
Happy to file any of these as separate issues if it helps. |
8b09d6b to
09679ee
Compare
|
Run Gluten Clickhouse CI on x86 |
09679ee to
ab9e0f7
Compare
|
Run Gluten Clickhouse CI on x86 |
ab9e0f7 to
144e816
Compare
b77f4ab to
9a0f96a
Compare
9a0f96a to
b5b1906
Compare
2b96545 to
c3cc1bd
Compare
|
Run Gluten Clickhouse CI on x86 |
c3cc1bd to
97a6019
Compare
|
Run Gluten Clickhouse CI on x86 |
97a6019 to
9971c91
Compare
|
Run Gluten Clickhouse CI on x86 |
9971c91 to
f576df8
Compare
|
Run Gluten Clickhouse CI on x86 |
f576df8 to
f17dc6a
Compare
|
Run Gluten Clickhouse CI on x86 |
f17dc6a to
cda20eb
Compare
|
Run Gluten Clickhouse CI on x86 |
decdd0e to
ab055c5
Compare
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
Write V3 per-column cache bytes by default for Velox table cache. Partition stats now only controls the optional stats/pruning payload: stats off writes a no-stats V3 frame, stats on writes V3 with stats, and older native libraries still fall back to V2 stats or legacy bytes. Add the V3 no-stats JNI/native serializer, JVM parsing for statsLen=0, cross-language golden coverage, and GitHub Actions benchmark execution without committing local benchmark results. Change-Id: I2a8582f901fafd436cac1a1d16e0367e9330b336
ab055c5 to
2538fe5
Compare
|
Run Gluten Clickhouse CI on x86 |
What changes
This PR makes Velox table cache write V3 per-column framed bytes by default. Lazy materialization is a base table-cache capability;
spark.gluten.sql.columnar.tableCache.partitionStats.enablednow only controls the optional stats/pruning payload.spark.gluten.sql.columnar.tableCache.lazy.deserialization.enabled.statsLen=0) for the default lazy path.Performance
4-environment benchmark — eager
V2vs lazyV3, each without and with the optional partition-stats payload.ColumnarTableCacheLazyDeserBenchmark, GitHub ActionsVelox Backend (x86)run26906231294(branch head2538fe501).-XX:MaxRAMPercentage=70).V2 without stats= legacy raw Presto (eager, no pruning);V2 with stats=framedSerializeWithStats(eager + partition-stats pruning);V3 without stats= per-column lazy (default);V3 with stats= per-column lazy + pruning.Cache footprint (storage memory)
V3 per-column framing does not increase cache size vs eager V2/legacy for flat (non-dictionary) data, and the stats payload is negligible. This addresses the cache-footprint-regression concern.
Build / write — avg ms over 3 iters (lower is better)
Write time is within ~1% across all four — V3 framing and stats computation add no measurable write overhead (the phase is dominated by range generation + range-repartition shuffle).
Read — avg ms over 3 iters (lower is better)
sum(c0)LazyVectorwrapping adds no overhead when every column is materialized.How was this patch tested?
./dev/format-scala-code.shPATH="/opt/homebrew/opt/llvm@15/bin:$PATH" ./dev/format-cpp-code.shgit diff --check upstream/main..HEADruby -e 'require "yaml"; YAML.load_file(".github/workflows/velox_backend_x86.yml"); puts "yaml ok"'./.github/workflows/util/check.sh upstream/mainenv CCACHE_DIR=/private/tmp/gluten-ccache ninja -C cpp/build velox/tests/CMakeFiles/velox_operators_test.dir/VeloxColumnarBatchSerializerTest.cc.o./build/mvn install -pl backends-velox -am -Pspark-3.5 -Pscala-2.12 -Pbackends-velox -DskipTests -Dexec.skipColumnarTableCacheLazyDeserBenchmarkwith1000rows,4partitions,1iteration, phasesbuild,read1,read4,readAll,filter.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Codex GPT-5