New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Update Compression content #2664

Merged

mkindahl merged 7 commits into latest from compression-howto-lana

Nov 1, 2023

Contributor

Loquacity commented Aug 30, 2023

Description

Updates the existing Compression content, and includes more how to content.

Links

Fixes https://github.com/timescale/docs-private/issues/167

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

Is the content technically accurate?
Is the content complete?
Is the content presented in a logical order?
Does the content use appropriate names for features and products?
Does the content provide relevant links to further information?

Documentation team review checklist

Is the content free from typos?
Does the content use plain English?
Does the content contain clear sections for concepts, tasks, and references?
Have any images been uploaded to the correct location, and are resolvable?
If the page index was updated, are redirects required
and have they been implemented?
Have you checked the built version of this content?

github-actions bot commented Aug 30, 2023

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-compression-howto-lana

Loquacity requested a review from iroussos

August 30, 2023 06:45

Contributor Author

Loquacity commented Aug 30, 2023

@iroussos This is still in draft, but adding you so you can follow along as I go, and provide advice 💖

mkindahl reviewed

View reviewed changes

Contributor

mkindahl left a comment

Reviewing section "about compression". Will continue with the others.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

mkindahl reviewed

View reviewed changes

Contributor

mkindahl left a comment

Some suggestions regarding use of ORDER BY and SEGMENT BY.

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

use-timescale/compression/compression-design.md Show resolved Hide resolved

mkindahl reviewed

View reviewed changes

Contributor

mkindahl left a comment

Some minor comments, otherwise it looks accurate. I wonder if you want to mention that compression stores at most 1000 elements in a compressed column somewhere. It affects the compression ratio if you have few elements for each segment-by column.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

mkindahl reviewed

View reviewed changes

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

Loquacity requested a review from mkindahl

September 11, 2023 05:31

Loquacity marked this pull request as ready for review

September 11, 2023 05:31

Loquacity requested a review from a team

September 11, 2023 05:31

mkindahl reviewed

View reviewed changes

Contributor

mkindahl left a comment

A few more suggestions for improvements.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

use-timescale/compression/compression-design.md Outdated

Comment on lines 149 to 151

+              However, using the previous example, if you wanted to query the data based on
+              device ID, you would have to decompress the device column to decide if that
+              batch should be part of the query. When you compress your hypertable, you can

Contributor

mkindahl Sep 11, 2023

It is not entirely clear why the entire column have to be decompressed.

One reason is because every compressed row is likely to contain every device, which means that having a max and a min does not help in deciding if the row need to be decompressed: the test indicates that the column always has to be decompressed.

The other reason is because we cannot have an index on data in a compressed column.

Contributor Author

Loquacity Sep 12, 2023

My interpretation was the former: that you don't know if the device is referenced in the column without decompressing it first.

Contributor

mkindahl Sep 12, 2023

Sorry, I was unclear. Both reasons are valid, but let me elaborate a little.

Typically, device IDs are taken from a separate table containing a serial ID. This means that device IDs typically are integers, not strings. With this in mind, we can compress the device ID column as well, which gives us something like this:

`device`	`_ts_meta_min_1`	`_ts_meta_max_1`
[0, 1, 2, 1, 2, 3, 0]	0	3
[0, 0, 3, 0, 0, 2, 0]	0	3

However, as you can see, the range does not allow us to filter out the rows that does not contain 1 since the entire range of device IDs are covered in each row, so we have to scan all rows and decompress the column to see if there is a match.

However, if we had an index on the device ID, we would be able to select all the compressed rows for that device, so by segmenting the data by the device ID, we can easily select the rows that have matching IDs using an index.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved

Loquacity requested a review from mkindahl

September 12, 2023 10:29

mkindahl removed their request for review

September 14, 2023 12:41

Loquacity closed this

Contributor

iroussos commented Sep 15, 2023

Reopening this PR as it seems to have been closed by accident

iroussos reopened this

jonatas force-pushed the compression-howto-lana branch from 3357ea8 to 03b14df Compare

October 5, 2023 13:57

leeshyan force-pushed the compression-howto-lana branch from 03b14df to ed5511e Compare

October 10, 2023 00:21

leeshyan enabled auto-merge (squash)

October 10, 2023 00:21

Loquacity added 6 commits

November 1, 2023 11:32


          Pull policy section out of about

2b91c21


          add blog post content and edit

df1e845


          add methods page to index

5cd55e7


          compression design

a27ae3f


          Edits per feedback

31b451a


          remove repeated word

d86c2cc

mkindahl force-pushed the compression-howto-lana branch 2 times, most recently from 00a73e2 to d86c2cc Compare

November 1, 2023 11:04

Contributor

mkindahl commented Nov 1, 2023

@iroussos Updated to remove the changes to support the new API. PTAL.


          Use "zeroes" instead of "0s"

mkindahl disabled auto-merge

November 1, 2023 11:10

iroussos approved these changes

View reviewed changes

Contributor

iroussos left a comment

Thank you @mkindahl, looks really good; happy to get this out and then keep on iterating!

mkindahl merged commit ad3fef4 into latest

1 of 2 checks passed

mkindahl deleted the compression-howto-lana branch

November 1, 2023 11:41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet