Skip to content

Commit 0c00972

Browse files
authored
Update README.md (#168)
1 parent 5008629 commit 0c00972

File tree

1 file changed

+13
-11
lines changed

1 file changed

+13
-11
lines changed

README.md

+13-11
Original file line numberDiff line numberDiff line change
@@ -74,24 +74,26 @@ canonical representations of each of the logical data types. The canonical encod
7474

7575
### Compressed Encodings
7676

77-
Vortex includes a set of compressed encodings that can hold compression in-memory arrays allowing us to defer
78-
compression. These are:
77+
Vortex includes a set of highly data-parallel, vectorized encodings. These encodings each correspond to a compressed
78+
in-memory array implementation, allowing us to defer decompression. Currently, these are:
7979

80-
* BitPacked
80+
* Adaptive Lossless Floating Point (ALP)
81+
* BitPacked (FastLanes)
8182
* Constant
8283
* Chunked
84+
* Delta (FastLanes)
8385
* Dictionary
8486
* Frame-of-Reference
85-
* Run-end
87+
* Run-end Encoding
8688
* RoaringUInt
8789
* RoaringBool
8890
* Sparse
8991
* ZigZag
9092

9193
### Compression
9294

93-
Vortex's compression scheme is based on
94-
the [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) paper.
95+
Vortex's top-level compression strategy is based on the
96+
[BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) paper.
9597

9698
Roughly, for each chunk of data, a sample of at least ~1% of the data is taken. Compression is then attempted (
9799
recursively) with a set of lightweight encodings. The best-performing combination of encodings is then chosen to encode
@@ -135,13 +137,13 @@ Vortex serde is currently in the design phase. The goals of this implementation
135137
* Forward statistical information (such as sortedness) to consumers.
136138
* To provide a building block for file format authors to store compressed array data.
137139

138-
## Vs Apache Arrow
140+
## Integration with Apache Arrow
139141

140-
It is important to note that Vortex and Arrow have different design goals. As such, it is somewhat
141-
unfair to make any comparison at all. But given both can be used as array libraries, it is worth noting the differences.
142+
Apache Arrow is the de facto standard for interoperating on columnar array data. Naturally, Vortex is designed to
143+
be maximally compatible with Apache Arrow. All Arrow arrays can be converted into Vortex arrays with zero-copy,
144+
and a Vortex array constructed from an Arrow array can be converted back to Arrow, again with zero-copy.
142145

143-
Vortex is designed to be maximally compatible with Apache Arrow. All Arrow arrays can be converted into Vortex arrays
144-
with zero-copy, and a Vortex array constructed from an Arrow array can be converted back to Arrow, again with zero-copy.
146+
It is important to note that Vortex and Arrow have different--albeit complementary--goals.
145147

146148
Vortex explicitly separates logical types from physical encodings, distinguishing it from Arrow. This allows
147149
Vortex to model more complex arrays while still exposing a logical interface. For example, Vortex can model a UTF8

0 commit comments

Comments
 (0)