@@ -74,24 +74,26 @@ canonical representations of each of the logical data types. The canonical encod
74
74
75
75
### Compressed Encodings
76
76
77
- Vortex includes a set of compressed encodings that can hold compression in-memory arrays allowing us to defer
78
- compression. These are:
77
+ Vortex includes a set of highly data-parallel, vectorized encodings. These encodings each correspond to a compressed
78
+ in-memory array implementation, allowing us to defer decompression. Currently, these are:
79
79
80
- * BitPacked
80
+ * Adaptive Lossless Floating Point (ALP)
81
+ * BitPacked (FastLanes)
81
82
* Constant
82
83
* Chunked
84
+ * Delta (FastLanes)
83
85
* Dictionary
84
86
* Frame-of-Reference
85
- * Run-end
87
+ * Run-end Encoding
86
88
* RoaringUInt
87
89
* RoaringBool
88
90
* Sparse
89
91
* ZigZag
90
92
91
93
### Compression
92
94
93
- Vortex's compression scheme is based on
94
- the [ BtrBlocks] ( https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf ) paper.
95
+ Vortex's top-level compression strategy is based on the
96
+ [ BtrBlocks] ( https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf ) paper.
95
97
96
98
Roughly, for each chunk of data, a sample of at least ~ 1% of the data is taken. Compression is then attempted (
97
99
recursively) with a set of lightweight encodings. The best-performing combination of encodings is then chosen to encode
@@ -135,13 +137,13 @@ Vortex serde is currently in the design phase. The goals of this implementation
135
137
* Forward statistical information (such as sortedness) to consumers.
136
138
* To provide a building block for file format authors to store compressed array data.
137
139
138
- ## Vs Apache Arrow
140
+ ## Integration with Apache Arrow
139
141
140
- It is important to note that Vortex and Arrow have different design goals. As such, it is somewhat
141
- unfair to make any comparison at all. But given both can be used as array libraries, it is worth noting the differences.
142
+ Apache Arrow is the de facto standard for interoperating on columnar array data. Naturally, Vortex is designed to
143
+ be maximally compatible with Apache Arrow. All Arrow arrays can be converted into Vortex arrays with zero-copy,
144
+ and a Vortex array constructed from an Arrow array can be converted back to Arrow, again with zero-copy.
142
145
143
- Vortex is designed to be maximally compatible with Apache Arrow. All Arrow arrays can be converted into Vortex arrays
144
- with zero-copy, and a Vortex array constructed from an Arrow array can be converted back to Arrow, again with zero-copy.
146
+ It is important to note that Vortex and Arrow have different--albeit complementary--goals.
145
147
146
148
Vortex explicitly separates logical types from physical encodings, distinguishing it from Arrow. This allows
147
149
Vortex to model more complex arrays while still exposing a logical interface. For example, Vortex can model a UTF8
0 commit comments