Skip to content

Commit fdd5779

Browse files
committed
Add compression setup to cryptocurrency tutorial
1 parent 7295e77 commit fdd5779

File tree

2 files changed

+109
-0
lines changed

2 files changed

+109
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: Query the Bitcoin blockchain - set up compression
3+
excerpt: Compress the dataset so you can store the Bitcoin blockchain more efficiently
4+
products: [cloud]
5+
keywords: [beginner, crypto, blockchain, Bitcoin, finance, analytics]
6+
layout_components: [next_prev_large]
7+
content_group: Query the Bitcoin blockchain
8+
---
9+
10+
# Set up compression and compress the dataset
11+
12+
This tutorial uses a dataset that contains Bitcoin blockchain data for
13+
the past five days, in a hypertable named `transactions`.
14+
15+
<Collapsible heading="Setting up compression" defaultExpanded={false}>
16+
17+
Storing a lot of blockchain data can be expensive. Using compression, you can store
18+
the same dataset much more efficiently and potentially query it even faster.
19+
20+
## Compression setup
21+
22+
1. Connect to the Timescale database that contains the Bitcoin dataset.
23+
1. At the psql prompt, run the ALTER command to enable compression on the hypertable:
24+
25+
```sql
26+
ALTER TABLE transactions
27+
SET (
28+
timescaledb.compress,
29+
timescaledb.compress_segmentby='block_id',
30+
timescaledb.compress_orderby='time DESC'
31+
);
32+
```
33+
1. Compress all the chunks of the hypertable:
34+
```sql
35+
SELECT compress_chunk(c) from show_chunks('transactions');
36+
```
37+
1. See the dataset size before and after compression:
38+
```sql
39+
SELECT
40+
pg_size_pretty(before_compression_total_bytes) as before,
41+
pg_size_pretty(after_compression_total_bytes) as after,
42+
FROM hypertable_compression_stats('transactions');
43+
```
44+
1. The output should show considerable storage savings, something like this:
45+
```sql
46+
before | after
47+
---------+--------
48+
1307 MB | 237 MB
49+
(1 row)
50+
```
51+
</Collapsible>
52+
53+
<Collapsible heading="Automatic compression" defaultExpanded={false}>
54+
## Automating compression
55+
56+
If this was a production usecase, you could automate compression by adding a policy which would compress data after a certain age:
57+
58+
```sql
59+
SELECT add_compression_policy('transactions', INTERVAL '8 days');
60+
```
61+
62+
Previous command would compress the chunks after they reach 8 days of age.
63+
</Collapsible>
64+
65+
<Collapsible heading="Taking advantage of query speedups" defaultExpanded={false}>
66+
## Faster analytical queries
67+
68+
Previously, we have setup compression to be segmented by block_id column value.
69+
This means fetching data by filtering or grouping on that column will be fetched
70+
more efficiently. Ordering is also set to time descending so if we run queries
71+
which try to order data with that ordering, we should see performance benefits.
72+
73+
For instance, if we run the query example from previous section:
74+
```sql
75+
WITH recent_blocks AS (
76+
SELECT block_id FROM transactions
77+
WHERE is_coinbase IS TRUE
78+
ORDER BY time DESC
79+
LIMIT 5
80+
)
81+
SELECT
82+
t.block_id, count(*) AS transaction_count,
83+
SUM(weight) AS block_weight,
84+
SUM(output_total_usd) AS block_value_usd
85+
FROM transactions t
86+
INNER JOIN recent_blocks b ON b.block_id = t.block_id
87+
WHERE is_coinbase IS NOT TRUE
88+
GROUP BY t.block_id;
89+
```
90+
91+
We should see a decent performance difference when the dataset is compressed and
92+
when is decompressed. Try it yourself by running the previous query, decompressing
93+
the dataset and running it again while timing the execution time.
94+
95+
You can decompress the whole dataset by running:
96+
```sql
97+
SELECT decompress_chunk(c) from show_chunks('transactions');
98+
```
99+
100+
On an example machine, we have observed a speedup of around two order of magnitude
101+
(15ms when compressed vs 1 second when decompressed).
102+
103+
Try it yourself and see what you get!
104+
</Collapsible>
105+
106+
[satoshi-def]: https://www.pcmag.com/encyclopedia/term/satoshi
107+
[coinbase-def]: https://www.pcmag.com/encyclopedia/term/coinbase-transaction

tutorials/blockchain-query/index.md

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This tutorial covers:
3030

3131
1. [Setting up your dataset][blockchain-dataset]
3232
1. [Querying your dataset][blockchain-query]
33+
1. [Bonus: Store data efficiently][blockchain-compress]
3334

3435
## About querying the Bitcoin blockchain with Timescale
3536

@@ -53,6 +54,7 @@ analyze the blockchain data using Timescale hyperfunctions.
5354
[cloud-install]: /getting-started/:currentVersion:/#create-your-timescale-account
5455
[blockchain-dataset]: /tutorials/:currentVersion:/blockchain-query/blockchain-dataset/
5556
[blockchain-query]: /tutorials/:currentVersion:/blockchain-query/beginner-blockchain-query/
57+
[blockchain-compress]: /tutorials/:currentVersion:/blockchain-query/blockchain-compress/
5658
[blockchain-def]: https://www.pcmag.com/encyclopedia/term/blockchain
5759
[transactions-def]: https://www.pcmag.com/encyclopedia/term/bitcoin-transaction
5860
[analyze-blockchain]: /tutorials/:currentVersion:/blockchain-analyze/

0 commit comments

Comments
 (0)