Skip to content

Enhanced storage docs #4009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 23 commits into
base: latest
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
ac91838
Enhanced storage pricing page updates (#3960)
atovpeko Apr 7, 2025
8f9e62d
Enhanced storage IO Boost update (#3963)
atovpeko Apr 7, 2025
744051e
Enhanced storage Use section updates (#3964)
atovpeko Apr 9, 2025
c096055
Merge branch 'latest' of github.com:timescale/docs into milestone-14-…
atovpeko Apr 9, 2025
0dfb79c
updates on review
atovpeko Apr 11, 2025
331a38a
Merge branch 'latest' into milestone-14-enhanced-storage
atovpeko Apr 11, 2025
30e42a5
updates on review
atovpeko Apr 11, 2025
36c8ed1
Merge branch 'latest' into milestone-14-enhanced-storage
billy-the-fish Apr 11, 2025
a41986b
update on review
atovpeko Apr 16, 2025
06a3f45
change TOC title
atovpeko Apr 16, 2025
7b96fe0
Merge branch 'latest' into milestone-14-enhanced-storage
billy-the-fish Apr 16, 2025
63446e1
update
atovpeko Apr 17, 2025
e92a797
review
atovpeko Apr 21, 2025
68b4a72
Merge branch 'latest' into milestone-14-enhanced-storage
atovpeko Apr 21, 2025
dd7c6f7
add a note about regions
atovpeko Apr 22, 2025
51868cf
Merge branch 'latest' into milestone-14-enhanced-storage
atovpeko Apr 24, 2025
9eb846b
Merge branch 'latest' into milestone-14-enhanced-storage
atovpeko Apr 28, 2025
291d38e
update on review
atovpeko Apr 28, 2025
346e5f3
update regions
atovpeko Apr 28, 2025
b31f536
update regions
atovpeko Apr 28, 2025
eaa0c8c
Merge branch 'latest' of github.com:timescale/docs into milestone-14-…
atovpeko May 12, 2025
6e1486d
resolve conflict
atovpeko May 12, 2025
710fe03
Merge branch 'latest' into milestone-14-enhanced-storage
atovpeko May 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _partials/_tiered-storage-billing.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
$COMPANY charges only for the storage that your data occupies in S3, regardless of whether it was compressed in $CLOUD_LONG before tiering. There are no additional expenses, such as data transfer or compute.
$COMPANY charges only for the storage that your data occupies in S3 in the Apache Parquet format, regardless of whether it was compressed in $CLOUD_LONG before tiering. There are no additional expenses, such as data transfer or compute.
237 changes: 114 additions & 123 deletions about/pricing-and-account-management.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions lambda/redirects.js
Original file line number Diff line number Diff line change
Expand Up @@ -977,6 +977,10 @@ module.exports = [
from: '/api/latest/actions/',
to: 'https://docs.timescale.com/api/latest/jobs-automation/',
},
{
from: '/use-timescale/latest/services/i-o-boost/',
to: 'https://docs.timescale.com/use-timescale/latest/data-tiering/enabling-data-tiering/',
},
{
from: '/use-timescale/latest/metrics-logging/integrations/',
to: 'https://docs.timescale.com/use-timescale/latest/metrics-logging/',
Expand Down
84 changes: 44 additions & 40 deletions use-timescale/data-tiering/about-data-tiering.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: About the object storage tier
title: About Timescale Cloud storage tiers
excerpt: Learn how Timescale Cloud helps you save on storage costs. The Timescale Cloud tiered storage architecture includes a high-performance storage tier and a low-cost object storage tier built on Amazon s3
product: [cloud]
keywords: [tiered storage]
Expand All @@ -9,45 +9,32 @@ cloud_ui:
- [services, :serviceId, overview]
---

# About the object storage tier
import TieredStorageBilling from "versionContent/_partials/_tiered-storage-billing.mdx";

$COMPANY's tiered storage architecture includes a standard high-performance storage tier and a low-cost object storage tier built on Amazon S3. You can use the standard tier for data that requires quick access, and the object tier for rarely used historical data. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers.
# About storage tiers

In the high-performance storage, your data is stored in the block format. In the object storage, it is stored in [Apache Parquet][parquet]. The original size of the data in your $SERVICE_SHORT, compressed or uncompressed, does not correspond directly to its size in S3. A compressed hypertable may even take more space in S3 than it does in $CLOUD_LONG.
$COMPANY's tiered storage architecture includes a high-performance storage tier and a low-cost object storage tier. You use the high-performance tier for data that requires quick access, and the object tier for rarely used historical data. Tiering policies move older data asynchronously and periodically from high-performance to low-cost storage, sparing you the need to do it manually. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers.

Apache Parquet allows for more efficient scans across longer time periods, and $CLOUD_LONG uses other metadata and query optimizations to reduce the amount of data that needs to be fetched from the object storage tier to satisfy a query.
## High-performance storage

Regardless of where your data is stored, you can still query it with standard SQL. A single SQL query transparently pulls data from the appropriate chunks using the chunk exclusion algorithms. You can `JOIN` against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well.
High-performance storage is where your data is stored by default, until you [enable tiered storage][manage-tiering] and [move older data to the low-cost tier][move-data]. In the high-performance storage, your data is stored in the block format and optimized for frequent querying. The [$HYPERCORE row-columnar storage engine][hypercore] available in this tier is designed specifically for real-time analytics. It enables you to compress the data in the high-performance storage by up to 90%, while improving performance. Coupled with other optimizations, $CLOUD_LONG high-performance storage makes sure your data is always accessible and your queries run at lightning speed.

$COMPANY charges only for the storage that your data occupies in S3, regardless of whether it was compressed in $CLOUD_LONG before tiering. There are no additional expenses, such as data transfer or compute.
$CLOUD_LONG high-performance storage comes in the following types:

## Benefits of the object storage tier
- **Standard** (default): based on [AWS EBS gp3][aws-gp3] and designed for general workloads. Provides up to 16 TB of storage and 16,000 IOPS.
- **Enhanced**: based on [EBS io2][ebs-io2] and designed for high-scale, high-throughput workloads. Provides up to 64 TB of storage and 32,000 IOPS.

The object storage tier is more than an archiving solution. It is also:

* **Cost-effective:** store high volumes of data at a lower cost.
You pay only for what you store, with no extra cost for queries.

* **Scalable:** scale past the restrictions imposed by storage that can be attached
directly to a Timescale service (currently 16 TB).

* **Online:** your data is always there and can be [queried when needed][querying-tiered-data].

## Architecture

The tiered storage backend works by periodically and asynchronously moving older chunks from the high-performance storage to the object storage.
There, it's stored in the Apache Parquet format, which is a compressed columnar format well-suited for S3. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together.
[See the differences][aws-storage-types] in the underlying AWS storage. You [enable enhanced storage][enable-enhanced] as needed in $CONSOLE.

By default, tiered data is not included when querying from a Timescale service.
However, you can access tiered data by [enabling tiered reads][querying-tiered-data] for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both.
## Low-cost storage

Various SQL optimizations limit what needs to be read from S3:
Once you [enable tiered storage][manage-tiering], you can start moving rarely used data to the object tier. The object tier is based on AWS S3 and stores your data in the [Apache Parquet][parquet] format. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together. The original size of the data in your $SERVICE_SHORT, compressed or uncompressed, does not correspond directly to its size in S3. A compressed hypertable may even take more space in S3 than it does in $CLOUD_LONG.

* **Chunk pruning** - exclude the chunks that fall outside the query time window.
* **Row group pruning** - identify the row groups within the Parquet object that satisfy the query.
* **Column pruning** - fetch only columns that are requested by the query.
Apache Parquet allows for more efficient scans across longer time periods, and $CLOUD_LONG uses other metadata and query optimizations to reduce the amount of data that needs to be fetched to satisfy a query, such as:

The result is transparent queries across high-performance storage and object storage, so your queries fetch the same data as before.
- **Chunk skipping**: exclude the chunks that fall outside the query time window.
- **Row group skipping**: identify the row groups within the Parquet object that satisfy the query.
- **Column skipping**: fetch only columns that are requested by the query.

The following query is against a tiered dataset and illustrates the optimizations:

Expand Down Expand Up @@ -82,14 +69,24 @@ ime zone))

`EXPLAIN` illustrates which chunks are being pulled in from the object storage tier:

1. Fetch data from chunks 42, 43, and 44 from the object storage tier.
1. Prune row groups and limit the fetch to a subset of the offsets in the
Parquet object that potentially match the query filter. Only fetch the data
for `device_uuid`, `sensor_id`, and `observed_at` as the query needs only these 3 columns.
1. Fetch data from chunks 42, 43, and 44 from the object storage tier.
1. Skip row groups and limit the fetch to a subset of the offsets in the
Parquet object that potentially match the query filter. Only fetch the data
for `device_uuid`, `sensor_id`, and `observed_at` as the query needs only these 3 columns.

The object storage tier is more than an archiving solution. It is also:

- **Cost-effective:** store high volumes of data at a lower cost. You pay only for what you store, with no extra cost for queries.
- **Scalable:** scale past the restrictions of even the enhanced high-performance storage tier.
- **Online:** your data is always there and can be [queried when needed][querying-tiered-data].

By default, tiered data is not included when you query from a $SERVICE_LONG. To access tiered data, you [enable tiered reads][querying-tiered-data] for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both. You can `JOIN` against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well.

<TieredStorageBilling />

## Limitations
The low-cost storage tier comes with the following limitations:

* **Limited schema modifications.** Some schema modifications are not allowed
- **Limited schema modifications**: some schema modifications are not allowed
on hypertables with tiered chunks.

_Allowed_ modifications include: renaming the hypertable, adding columns
Expand All @@ -103,24 +100,31 @@ for `device_uuid`, `sensor_id`, and `observed_at` as the query needs only these
defaults, renaming a column, changing the data type of a
column, and adding a `NOT NULL` constraint to the column.

* **Limited data changes.** You cannot insert data into, update, or delete a
- **Limited data changes**: you cannot insert data into, update, or delete a
tiered chunk. These limitations take effect as soon as the chunk is
scheduled for tiering.

* **Inefficient query planner filtering for non-native data types.** The query
- **Inefficient query planner filtering for non-native data types**: the query
planner speeds up reads from our object storage tier by using metadata
to filter out columns and row groups that don't satisfy the query. This works for all
native data types, but not for non-native types, such as `JSON`, `JSONB`,
and `GIS`.

* **Latency.** S3 has higher access latency than local storage. This can affect the
* **Latency**: S3 has higher access latency than local storage. This can affect the
execution time of queries in latency-sensitive environments, especially
lighter queries.

* **Number of dimensions.** You cannot use tiered storage with hypertables
* **Number of dimensions**: you cannot use tiered storage with hypertables
partitioned on more than one dimension. Make sure your hypertables are
partitioned on time only, before you enable tiered storage.

[blog-data-tiering]: https://www.timescale.com/blog/expanding-the-boundaries-of-postgresql-announcing-a-bottomless-consumption-based-object-storage-layer-built-on-amazon-s3/
[querying-tiered-data]: /use-timescale/:currentVersion:/data-tiering/querying-tiered-data/
[parquet]: https://parquet.apache.org/
[parquet]: https://parquet.apache.org/
[manage-tiering]: /use-timescale/:currentVersion:/data-tiering/enabling-data-tiering/#enable-tiered-storage
[move-data]: /use-timescale/:currentVersion:/data-tiering/enabling-data-tiering/#automate-tiering-with-policies
[hypercore]: /use-timescale/:currentVersion:/hypercore
[aws-gp3]: https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html
[ebs-io2]: https://docs.aws.amazon.com/ebs/latest/userguide/provisioned-iops.html#io2-block-express
[enable-enhanced]: /use-timescale/:currentVersion:/data-tiering/enabling-data-tiering/#high-performance-storage-tier
[aws-storage-types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html#vol-type-ssd
Loading