-
Notifications
You must be signed in to change notification settings - Fork 0
Compressed Integer
The compressed integer is a way to store a positive-only number in a variable byte length. It is almost identical to LEB128.
It is the same as LEB128
, but with the first bit swapped. zchunk
reinvented LEB128
by mistake.
How to read an compressed int:
- If the first (most significat) bit is 1, this is the last byte to read.
- Take the 7 least significt bits and add them to the nth byte of your array.
- Repeat
The int
or long
value from java is treated as unsigned. That means, that the adopted value may differ from the value which java prints via toString()
conversion.
Example:
Type | Value (Java) | Java uncompressed | Intepreted as | CompressedInt |
---|---|---|---|---|
int |
0 |
0x00000000 |
0 |
0x80 |
int |
2_147_483_647 |
0x7FFFFFFF |
2_147_483_647 |
0xFFFF_FEFE |
int |
-1 |
0xFFFFFFFF |
4_294_967_295 |
0x7F_7F7F_7F8F |
long |
4_294_967_295 |
0x00000000_FFFFFFFF |
4_294_967_295 |
0x7F_7F7F_7F8F |
long |
-1 |
0xFFFFFFFF_FFFFFFFF |
18_446_744_073_709_551_615 |
0x7F7F_7F7F7F7F_7F7F7F81 |
- The java value object "CompressedInt" will only know the compressed byte array after construction.
- Each method is lazy. That means, that the corresponding value will be calculated only once.
- If all methods on a compressed int were called, it would consume the space of byte[10] (maximum) + BigInteger + long + long + int.
Can be written as: 0x80 == 0b10000000
.
0x81 == 0b10000001
In normal notation: 394 == 0b00000000_00000000_00000000_00000000_00000000_00000000_00000001_10001010
. This exceeds one byte obviously. That's how a "normal" integer would be stored. For this example, the leading all-zero-bytes are omitted, as they won't make a difference in this algorithm.
Input: 0b00000001_10001010
.
To compress:
- Take the rightmost (least significant) 7 bytes, which are
0b00001010
. - The most significant byte is a zero by default.
- Unsign shift the byte notation by 7, which makes it
0b00000000_00000011
- Repeat. This means the second byte will be
0b00000011
. - Right shift by 7. Now we get
0b00000000_00000000
. That's 0, so we stop here. - Our result is now an array of two bytes:
0b00001010, 0b00000011
. Please note that the bytes also "swapped places". - The last byte will have the first (most significant) bit of the last byte set to 1. As a final result we get:
0b00001010_10000011
.
To decompress:
- Create a result int, which starts at
0b00000000
. - Create a byte index counter, which starts at 0.
- Read a byte. If the first bit is 1, this is the last step.
- Read the least 7 significant bits, which are:
0b00001010
. - Shift the read byte by 7 * byte index (which is 7 * 0 = 7).
- Read the next byte and increase the byte index by 7.
- Read the next byte, which is
0b10000011
. As this starts with 1, this will be the last byte to read. - Read the 7 least significant bits, which are:
0b00000011
. - Shift the read byte by 7 * byte index (which is 7 * 1 = 7). This will give you
0b0000001_1000000
. - Add them to the result in by using "logical or".
The last operation is:
0b00000000_00001010 (from step 4)
| 0b00000001_10000000 (from step 8)
===========================
0b00000001_10001010