-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Compression content #2664
Conversation
Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-compression-howto-lana |
@iroussos This is still in draft, but adding you so you can follow along as I go, and provide advice 💖 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing section "about compression". Will continue with the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions regarding use of ORDER BY
and SEGMENT BY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, otherwise it looks accurate. I wonder if you want to mention that compression stores at most 1000 elements in a compressed column somewhere. It affects the compression ratio if you have few elements for each segment-by column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions for improvements.
However, using the previous example, if you wanted to query the data based on | ||
device ID, you would have to decompress the device column to decide if that | ||
batch should be part of the query. When you compress your hypertable, you can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not entirely clear why the entire column have to be decompressed.
One reason is because every compressed row is likely to contain every device, which means that having a max and a min does not help in deciding if the row need to be decompressed: the test indicates that the column always has to be decompressed.
The other reason is because we cannot have an index on data in a compressed column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation was the former: that you don't know if the device is referenced in the column without decompressing it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was unclear. Both reasons are valid, but let me elaborate a little.
Typically, device IDs are taken from a separate table containing a serial ID. This means that device IDs typically are integers, not strings. With this in mind, we can compress the device ID column as well, which gives us something like this:
device |
_ts_meta_min_1 |
_ts_meta_max_1 |
---|---|---|
[0, 1, 2, 1, 2, 3, 0] | 0 | 3 |
[0, 0, 3, 0, 0, 2, 0] | 0 | 3 |
However, as you can see, the range does not allow us to filter out the rows that does not contain 1 since the entire range of device IDs are covered in each row, so we have to scan all rows and decompress the column to see if there is a match.
However, if we had an index on the device ID, we would be able to select all the compressed rows for that device, so by segmenting the data by the device ID, we can easily select the rows that have matching IDs using an index.
Reopening this PR as it seems to have been closed by accident |
3357ea8
to
03b14df
Compare
03b14df
to
ed5511e
Compare
00a73e2
to
d86c2cc
Compare
@iroussos Updated to remove the changes to support the new API. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mkindahl, looks really good; happy to get this out and then keep on iterating!
Description
Updates the existing Compression content, and includes more how to content.
Links
Fixes https://github.com/timescale/docs-private/issues/167
Writing help
For information about style and word usage, see the style guide
Review checklists
Reviewers: use this section to ensure you have checked everything before approving this PR:
Subject matter expert (SME) review checklist
Documentation team review checklist
and have they been implemented?