Skip to content

Commit 4eaf432

Browse files
committed
Update website guide for ingester scaling
Signed-off-by: Daniel Deluiggi <[email protected]>
1 parent 5b7b4f5 commit 4eaf432

File tree

1 file changed

+58
-3
lines changed

1 file changed

+58
-3
lines changed

docs/guides/ingesters-scaling-up-and-down.md

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,66 @@ no special care is required to take when scaling up ingesters.
2222

2323
## Scaling down
2424

25-
A running ingester holds several hours of time series data in memory before they're flushed to the long-term storage. When an ingester shuts down because of a scale down operation, the in-memory data must not be discarded in order to avoid any data loss.
25+
A running ingester holds several hours of time series data in memory before theyre flushed to the long-term storage. When an ingester shuts down because of a scale down operation, the in-memory data must not be discarded in order to avoid any data loss.
2626

27-
Ingesters don't flush series to blocks at shutdown by default. However, Cortex ingesters expose an API endpoint [`/shutdown`](../api/_index.md#shutdown) that can be called to flush series to blocks and upload blocks to the long-term storage before the ingester terminates.
27+
Ingesters dont flush series to blocks at shutdown by default. However, Cortex ingesters expose an API endpoint [`/shutdown`](../api/_index.md#shutdown) that can be called to flush series to blocks and upload blocks to the long-term storage before the ingester terminates.
2828

29-
Even if ingester blocks are compacted and shipped to the storage at shutdown, it takes some time for queriers and store-gateways to discover the newly uploaded blocks. This is due to the fact that the blocks storage runs a periodic scanning of the storage bucket to discover blocks. If two or more ingesters are scaled down in a short period of time, queriers may miss some data at query time due to series that were stored in the terminated ingesters but their blocks haven't been discovered yet.
29+
Even if ingester blocks are compacted and shipped to the storage at shutdown, it takes some time for queriers and store-gateways to discover the newly uploaded blocks. This is due to the fact that the blocks storage runs a periodic scanning of the storage bucket to discover blocks. If two or more ingesters are scaled down in a short period of time, queriers may miss some data at query time due to series that were stored in the terminated ingesters but their blocks haven’t been discovered yet.
30+
31+
### New Gradual Scaling Approach (Recommended)
32+
33+
Starting with Cortex 1.19.0, a new **READONLY** state for ingesters was introduced that enables gradual, safe scaling down without data loss or performance impact. This approach eliminates the need for complex configuration changes and allows for more flexible scaling operations.
34+
35+
#### How the READONLY State Works
36+
37+
The READONLY state allows ingesters to:
38+
- **Stop accepting new writes** - Push requests will be rejected and redistributed to other ingesters
39+
- **Continue serving queries** - Existing data remains available for queries, maintaining performance
40+
- **Gradually age out data** - As time passes, data naturally ages out according to your retention settings
41+
- **Be safely removed** - Once data has aged out, ingesters can be terminated without any impact
42+
43+
#### Step-by-Step Scaling Process
44+
45+
1. **Set ingesters to READONLY mode**
46+
```bash
47+
# Transition ingester to READONLY state
48+
curl -X POST http://ingester-1:8080/ingester/mode -d '{"mode": "READONLY"}'
49+
curl -X POST http://ingester-2:8080/ingester/mode -d '{"mode": "READONLY"}'
50+
curl -X POST http://ingester-3:8080/ingester/mode -d '{"mode": "READONLY"}'
51+
```
52+
53+
2. **Monitor data aging** (Optional but recommended)
54+
```bash
55+
# Check user statistics and loaded blocks on the ingester
56+
curl http://ingester-1:8080/ingester/all_user_stats
57+
```
58+
59+
3. **Wait for safe removal window**
60+
- **Immediate removal** (after step 1): Safe once queries no longer need the ingester's data
61+
- **Conservative approach**: Wait for `querier.query-ingesters-within` duration (e.g., 5 hours)
62+
- **Complete data aging**: Wait for full retention period to ensure all blocks are removed
63+
64+
4. **Remove ingesters**
65+
```bash
66+
# Terminate the ingester processes
67+
kubectl delete pod ingester-1 ingester-2 ingester-3
68+
```
69+
70+
#### Timeline Example
71+
72+
For a cluster with `querier.query-ingesters-within=5h`:
73+
74+
- **T0**: Set ingesters 5, 6, 7 to READONLY state
75+
- **T1**: Ingesters stop receiving new data but continue serving queries
76+
- **T2 (T0 + 5h)**: Ingesters no longer receive query requests (safe to remove)
77+
- **T3 (T0 + retention_period)**: All blocks naturally removed from ingesters
78+
- **T4**: Remove ingesters from cluster
79+
80+
**Any time after T2 is safe for removal without service impact.**
81+
82+
### Legacy Approach (For Older Versions)
83+
84+
If you're running an older version of Cortex that doesn't support the READONLY state, you'll need to follow the legacy approach.
3085

3186
The ingesters scale down is deemed an infrequent operation and no automation is currently provided. However, if you need to scale down ingesters, please be aware of the following:
3287

0 commit comments

Comments
 (0)