Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Compression content #2664

Merged
merged 7 commits into from
Nov 1, 2023
Merged

Update Compression content #2664

merged 7 commits into from
Nov 1, 2023

Conversation

Loquacity
Copy link
Contributor

Description

Updates the existing Compression content, and includes more how to content.

Links

Fixes https://github.com/timescale/docs-private/issues/167

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

  • Is the content technically accurate?
  • Is the content complete?
  • Is the content presented in a logical order?
  • Does the content use appropriate names for features and products?
  • Does the content provide relevant links to further information?

Documentation team review checklist

  • Is the content free from typos?
  • Does the content use plain English?
  • Does the content contain clear sections for concepts, tasks, and references?
  • Have any images been uploaded to the correct location, and are resolvable?
  • If the page index was updated, are redirects required
    and have they been implemented?
  • Have you checked the built version of this content?

@github-actions
Copy link

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-compression-howto-lana

@Loquacity Loquacity requested a review from iroussos August 30, 2023 06:45
@Loquacity
Copy link
Contributor Author

@iroussos This is still in draft, but adding you so you can follow along as I go, and provide advice 💖

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing section "about compression". Will continue with the others.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions regarding use of ORDER BY and SEGMENT BY.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, otherwise it looks accurate. I wonder if you want to mention that compression stores at most 1000 elements in a compressed column somewhere. It affects the compression ratio if you have few elements for each segment-by column.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
@Loquacity Loquacity requested a review from mkindahl September 11, 2023 05:31
@Loquacity Loquacity marked this pull request as ready for review September 11, 2023 05:31
@Loquacity Loquacity requested a review from a team September 11, 2023 05:31
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions for improvements.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
Comment on lines 149 to 151
However, using the previous example, if you wanted to query the data based on
device ID, you would have to decompress the device column to decide if that
batch should be part of the query. When you compress your hypertable, you can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not entirely clear why the entire column have to be decompressed.

One reason is because every compressed row is likely to contain every device, which means that having a max and a min does not help in deciding if the row need to be decompressed: the test indicates that the column always has to be decompressed.

The other reason is because we cannot have an index on data in a compressed column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was the former: that you don't know if the device is referenced in the column without decompressing it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was unclear. Both reasons are valid, but let me elaborate a little.

Typically, device IDs are taken from a separate table containing a serial ID. This means that device IDs typically are integers, not strings. With this in mind, we can compress the device ID column as well, which gives us something like this:

device _ts_meta_min_1 _ts_meta_max_1
[0, 1, 2, 1, 2, 3, 0] 0 3
[0, 0, 3, 0, 0, 2, 0] 0 3

However, as you can see, the range does not allow us to filter out the rows that does not contain 1 since the entire range of device IDs are covered in each row, so we have to scan all rows and decompress the column to see if there is a match.

However, if we had an index on the device ID, we would be able to select all the compressed rows for that device, so by segmenting the data by the device ID, we can easily select the rows that have matching IDs using an index.

use-timescale/compression/compression-design.md Outdated Show resolved Hide resolved
@Loquacity Loquacity requested a review from mkindahl September 12, 2023 10:29
@mkindahl mkindahl removed their request for review September 14, 2023 12:41
@Loquacity Loquacity closed this Sep 15, 2023
@iroussos
Copy link
Contributor

Reopening this PR as it seems to have been closed by accident

@iroussos iroussos reopened this Sep 15, 2023
@jonatas jonatas force-pushed the compression-howto-lana branch from 3357ea8 to 03b14df Compare October 5, 2023 13:57
@leeshyan leeshyan force-pushed the compression-howto-lana branch from 03b14df to ed5511e Compare October 10, 2023 00:21
@leeshyan leeshyan enabled auto-merge (squash) October 10, 2023 00:21
@mkindahl mkindahl force-pushed the compression-howto-lana branch 2 times, most recently from 00a73e2 to d86c2cc Compare November 1, 2023 11:04
@mkindahl
Copy link
Contributor

mkindahl commented Nov 1, 2023

@iroussos Updated to remove the changes to support the new API. PTAL.

@mkindahl mkindahl disabled auto-merge November 1, 2023 11:10
Copy link
Contributor

@iroussos iroussos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mkindahl, looks really good; happy to get this out and then keep on iterating!

@mkindahl mkindahl merged commit ad3fef4 into latest Nov 1, 2023
1 of 2 checks passed
@mkindahl mkindahl deleted the compression-howto-lana branch November 1, 2023 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants