Implement series cache invalidation #529

JamesGuthrie · 2022-10-04T08:02:52Z

Description

In principle, the cache invalidation mechanism works as follows:

In the database, we track two values: current_epoch, and
delete_epoch. These are unix timestamps (which, for reasons of
backwards-compatibility, are stored in a BIGINT field), updated every
time that rows in the series table are deleted. current_epoch is set
from now(), and delete_epoch is set from now() - epoch_duration.
epoch_duration is a configurable parameter.

When a series row is to be deleted, instead of immediately deleting it,
we set the delete_epoch column of the series row to the
current_epoch timestamp (the time at which we decided that it will be
deleted). After epoch_duration time elapses, the row is removed.

When the connector starts, it reads current_epoch from the database
and stores this value with the series cache as cache_current_epoch.
The connector periodically fetches the ids of series rows where
delete_epoch is not null, together with current_epoch. It removes
these entries from the cache, and updates cache_current_epoch to the
value of current_epoch which was fetched from the database.

As the connector receives samples to be inserted, it tracks the smallest
value of cache_current_epoch that it saw for that batch. When it comes
to inserting the samples in a batch into the database, it asserts (in
the database) that the cache which was read from was not stale. This is
expressed as: cache_current_epoch > delete_epoch. If this is not the
case, the insert is aborted.

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

CHANGELOG entry for user-facing changes
Updated the relevant documentation

This change implements invalidation of the series cache, and mechanisms to prevent the ingestion of data based on stale cache information. In principle, the cache invalidation mechanism works as follows: In the database, we track two values: `current_epoch`, and `delete_epoch`. These are unix timestamps (which, for reasons of backwards-compatibility, are stored in a BIGINT field), updated every time that rows in the series table are deleted. `current_epoch` is set from `now()`, and `delete_epoch` is set from `now() - epoch_duration`. `epoch_duration` is a configurable parameter. When a series row is to be deleted, instead of immediately deleting it, we set the `delete_epoch` column of the series row to the `current_epoch` timestamp (the time at which we decided that it will be deleted). After `epoch_duration` time elapses, the row is removed. When the connector starts, it reads `current_epoch` from the database and stores this value with the series cache as `cache_current_epoch`. The connector periodically fetches the ids of series rows where `delete_epoch` is not null, together with `current_epoch`. It removes these entries from the cache, and updates `cache_current_epoch` to the value of `current_epoch` which was fetched from the database. As the connector receives samples to be inserted, it tracks the smallest value of `cache_current_epoch` that it saw for that batch. When it comes to inserting the samples in a batch into the database, it asserts (in the database) that the cache which was read from was not stale. This is expressed as: `cache_current_epoch > delete_epoch`. If this is not the case, the insert is aborted. This is a companion change to [1] which implements the database-side logic required for cache invalidation. [1]: timescale/promscale_extension#529

sumerman

_prom_catalog.delete_series_catalog_row is what concerns me the most, the rest is debatable or just questions.

I'm also not 100% sure about epoch initialization, I'll need to spend more time looking at the connector's code

sumerman · 2022-11-11T17:15:27Z

migration/idempotent/001-base.sql

@@ -866,7 +866,7 @@ AS
 $$
 BEGIN
    EXECUTE FORMAT(
-        'UPDATE prom_data_series.%1$I SET delete_epoch = current_epoch+1 FROM _prom_catalog.ids_epoch WHERE delete_epoch IS NULL AND id = ANY($1)',
+        'UPDATE prom_data_series.%1$I s SET delete_epoch = current_epoch FROM _prom_catalog.ids_epoch WHERE s.delete_epoch IS NULL AND id = ANY($1)',


Is it possible to rename the s.delete_epoch column? I would prefer to if there are no technical issues with it.

I don't think this is correct w.r.t our locking strategy. Same as in the mark function, this need at least a SELECT ... FOR SHARE on ids_epoch.

I supposed it would be possible. We used mark_for_deletion_epoch in some documentation, would you propose to use that?

I have changed this.

Yeah, that would be the best, in my opinion.

👍🏼

So this is possible, but I'm not sure that the effort is worth it. I've squashed my changes and pushed this change as a commit on top of the other changes. PTAL.

@sumerman I've hit an issue: the upgrade tests are not happy, because the index definition is not the same on the "fresh install" and "update" paths. The reason is that the attribute which belongs to the index is not renamed. I could reach into pg_attribute and rename this, but I'm wondering if it's worth it. Can you take a look at what I have already and let me know.

I wish it was easier, but TBH I don't think a little quality-of-life thing that can be solved with code comments is worth this extra effort and obscurity in the migration.

Therefore I retract my request.

migration/idempotent/001-base.sql

migration/idempotent/011-maintenance.sql

This change implements invalidation of the series cache, and mechanisms to prevent the ingestion of data based on stale cache information. In principle, the cache invalidation mechanism works as follows: In the database, we track two values: `current_epoch`, and `delete_epoch`. These are unix timestamps (which, for reasons of backwards-compatibility, are stored in a BIGINT field), updated every time that rows in the series table are deleted. `current_epoch` is set from `now()`, and `delete_epoch` is set from `now() - epoch_duration`. `epoch_duration` is a configurable parameter. When a series row is to be deleted, instead of immediately deleting it, we set the `delete_epoch` column of the series row to the `current_epoch` timestamp (the time at which we decided that it will be deleted). After `epoch_duration` time elapses, the row is removed. When the connector starts, it reads `current_epoch` from the database and stores this value with the series cache as `cache_current_epoch`. The connector periodically fetches the ids of series rows where `delete_epoch` is not null, together with `current_epoch`. It removes these entries from the cache, and updates `cache_current_epoch` to the value of `current_epoch` which was fetched from the database. As the connector receives samples to be inserted, it tracks the smallest value of `cache_current_epoch` that it saw for that batch. When it comes to inserting the samples in a batch into the database, it asserts (in the database) that the cache which was read from was not stale. This is expressed as: `cache_current_epoch > delete_epoch`. If this is not the case, the insert is aborted. This is a companion change to [1] which implements the database-side logic required for cache invalidation. [1]: timescale/promscale_extension#529

antekresic · 2022-11-21T13:09:47Z

migration/idempotent/001-base.sql

-                    WHERE series_exists.labels && ARRAY[label_id]
-                    LIMIT 1
-                )
+    --jit interacts poorly why the multi-partition query below


Indentation seems to be wrong here.

In principle, the cache invalidation mechanism works as follows: In the database, we track two values: `current_epoch`, and `delete_epoch`. These are unix timestamps (which, for reasons of backwards-compatibility, are stored in a BIGINT field), updated every time that rows in the series table are deleted. `current_epoch` is set from `now()`, and `delete_epoch` is set from `now() - epoch_duration`. `epoch_duration` is a configurable parameter. When a series row is to be deleted, instead of immediately deleting it, we set the `delete_epoch` column of the series row to the `current_epoch` timestamp (the time at which we decided that it will be deleted). After `epoch_duration` time elapses, the row is removed. When the connector starts, it reads `current_epoch` from the database and stores this value with the series cache as `cache_current_epoch`. The connector periodically fetches the ids of series rows where `delete_epoch` is not null, together with `current_epoch`. It removes these entries from the cache, and updates `cache_current_epoch` to the value of `current_epoch` which was fetched from the database. As the connector receives samples to be inserted, it tracks the smallest value of `cache_current_epoch` that it saw for that batch. When it comes to inserting the samples in a batch into the database, it asserts (in the database) that the cache which was read from was not stale. This is expressed as: `cache_current_epoch > delete_epoch`. If this is not the case, the insert is aborted.

This change implements invalidation of the series cache, and mechanisms to prevent the ingestion of data based on stale cache information. In principle, the cache invalidation mechanism works as follows: In the database, we track two values: `current_epoch`, and `delete_epoch`. These are unix timestamps (which, for reasons of backwards-compatibility, are stored in a BIGINT field), updated every time that rows in the series table are deleted. `current_epoch` is set from `now()`, and `delete_epoch` is set from `now() - epoch_duration`. `epoch_duration` is a configurable parameter. When a series row is to be deleted, instead of immediately deleting it, we set the `delete_epoch` column of the series row to the `current_epoch` timestamp (the time at which we decided that it will be deleted). After `epoch_duration` time elapses, the row is removed. When the connector starts, it reads `current_epoch` from the database and stores this value with the series cache as `cache_current_epoch`. The connector periodically fetches the ids of series rows where `delete_epoch` is not null, together with `current_epoch`. It removes these entries from the cache, and updates `cache_current_epoch` to the value of `current_epoch` which was fetched from the database. As the connector receives samples to be inserted, it tracks the smallest value of `cache_current_epoch` that it saw for that batch. When it comes to inserting the samples in a batch into the database, it asserts (in the database) that the cache which was read from was not stale. This is expressed as: `cache_current_epoch > delete_epoch`. If this is not the case, the insert is aborted. This is a companion change to [1] which implements the database-side logic required for cache invalidation. [1]: timescale/promscale_extension#529

sumerman

Just a couple of nits about comments. Otherwise LGTM. I've played a little with the model and now feel confident about the suggested epoch initialization mechanism.

sumerman · 2022-11-24T10:32:35Z

migration/idempotent/001-base.sql

+    SET LOCAL search_path = pg_catalog, pg_temp;
+
+    -- Now we recheck the delete conditions, and delete series.
+    -- This corresponds to the ActuallyDeleteTx in our model.


also the Resurrect label

sumerman · 2022-11-24T10:33:12Z

migration/idempotent/001-base.sql

+    -- This corresponds to the ActuallyDeleteTx in our model.
+    CALL _prom_catalog._actually_delete_series_and_labels(metric_schema, metric_table, metric_series_table, deletion_epoch);
+    -- Now we check if there are any labels which we can remove.
+    -- This is not reflected in the model.


these comment lines look out of place. I believe the belong somewhere inside _actually_delete_series_and_labels

This change implements invalidation of the series cache, and mechanisms to prevent the ingestion of data based on stale cache information. In principle, the cache invalidation mechanism works as follows: In the database, we track two values: `current_epoch`, and `delete_epoch`. These are unix timestamps (which, for reasons of backwards-compatibility, are stored in a BIGINT field), updated every time that rows in the series table are deleted. `current_epoch` is set from `now()`, and `delete_epoch` is set from `now() - epoch_duration`. `epoch_duration` is a configurable parameter. When a series row is to be deleted, instead of immediately deleting it, we set the `delete_epoch` column of the series row to the `current_epoch` timestamp (the time at which we decided that it will be deleted). After `epoch_duration` time elapses, the row is removed. When the connector starts, it reads `current_epoch` from the database and stores this value with the series cache as `cache_current_epoch`. The connector periodically fetches the ids of series rows where `delete_epoch` is not null, together with `current_epoch`. It removes these entries from the cache, and updates `cache_current_epoch` to the value of `current_epoch` which was fetched from the database. As the connector receives samples to be inserted, it tracks the smallest value of `cache_current_epoch` that it saw for that batch. When it comes to inserting the samples in a batch into the database, it asserts (in the database) that the cache which was read from was not stale. This is expressed as: `cache_current_epoch > delete_epoch`. If this is not the case, the insert is aborted. This is a companion change to [1] which implements the database-side logic required for cache invalidation. [1]: timescale/promscale_extension#529

sumerman

I believe this PR needs to be against feature/series-cache-invalidation branch to be in sync with the connector repo

niksajakovljevic · 2022-12-02T14:20:45Z

I didn't get into implementation but mostly reading the description. So if I understand correctly this implementation should enable us to only remove specific series ids (from connector cache) that have been deleted? One of the bigger problems with existing implementation is that on epoch change we just reset the whole cache in the connector which is really bad from the memory perspective.

migration/idempotent/001-base.sql

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch 2 times, most recently from 36be495 to bbeabb0 Compare October 11, 2022 11:46

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch from e95cbca to d63bc29 Compare October 25, 2022 10:18

JamesGuthrie mentioned this pull request Nov 4, 2022

Implement series cache invalidation timescale/promscale#1742

Closed

2 tasks

sumerman suggested changes Nov 11, 2022

View reviewed changes

JamesGuthrie mentioned this pull request Nov 14, 2022

Implement series cache invalidation timescale/promscale#1752

Merged

2 tasks

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch from 04f79b3 to 6995f2c Compare November 15, 2022 10:15

JamesGuthrie requested a review from antekresic November 16, 2022 16:34

antekresic reviewed Nov 21, 2022

View reviewed changes

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch from 76459a3 to b17ca85 Compare November 23, 2022 09:35

JamesGuthrie requested review from sumerman and antekresic November 23, 2022 09:35

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch from 4992a5c to f425b28 Compare November 23, 2022 15:45

JamesGuthrie force-pushed the jg/logical-to-time-epoch-2 branch from b94876b to 07057ad Compare November 23, 2022 17:07

sumerman approved these changes Nov 28, 2022

View reviewed changes

JamesGuthrie marked this pull request as ready for review November 30, 2022 12:38

JamesGuthrie requested a review from a team as a code owner November 30, 2022 12:38

disable rust cache

410b631

JamesGuthrie changed the title ~~WIP: logical to time epoch round 2~~ Implement series cache invalidation Nov 30, 2022

sumerman suggested changes Nov 30, 2022

View reviewed changes

JamesGuthrie mentioned this pull request Nov 30, 2022

Switch from logical to time-based epoch #512

Closed

2 tasks

sumerman approved these changes Dec 2, 2022

View reviewed changes

sumerman suggested changes Dec 5, 2022

View reviewed changes

migration/idempotent/001-base.sql Show resolved Hide resolved

sumerman self-requested a review December 5, 2022 11:50

sumerman approved these changes Dec 5, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement series cache invalidation #529

Implement series cache invalidation #529

JamesGuthrie commented Oct 4, 2022 •

edited

Loading

sumerman left a comment

sumerman Nov 11, 2022

JamesGuthrie Nov 14, 2022

sumerman Nov 22, 2022

JamesGuthrie Nov 23, 2022

JamesGuthrie Nov 23, 2022

sumerman Nov 23, 2022

sumerman Nov 23, 2022

antekresic Nov 21, 2022

JamesGuthrie Nov 23, 2022

sumerman left a comment

sumerman Nov 24, 2022

JamesGuthrie Nov 30, 2022

sumerman Nov 24, 2022

JamesGuthrie Nov 30, 2022

sumerman left a comment

niksajakovljevic commented Dec 2, 2022 •

edited

Loading

Implement series cache invalidation #529

Are you sure you want to change the base?

Implement series cache invalidation #529

Conversation

JamesGuthrie commented Oct 4, 2022 • edited Loading

Description

Merge requirements

sumerman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sumerman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sumerman left a comment

Choose a reason for hiding this comment

niksajakovljevic commented Dec 2, 2022 • edited Loading

JamesGuthrie commented Oct 4, 2022 •

edited

Loading

niksajakovljevic commented Dec 2, 2022 •

edited

Loading