ct: dl_stm mvcc snapshot #23960

nvartolomei · 2024-10-30T15:24:55Z

Backports Required

Release Notes

none

src/v/cloud_topics/dl_snapshot.h

vbotbuildovich · 2024-11-05T17:56:35Z

the below tests from https://buildkite.com/redpanda/redpanda/builds/57612#0192fd1e-999f-4f92-bbba-a801e042a7e9 have failed and will be retried

catalog_schema_manager_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/57805#01930b34-b146-4c2a-8a1a-1899f0b9b050 have failed and will be retried

storage_e2e_single_thread_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/57951#01932068-9731-4bd9-a765-c2f38c1c431d have failed and will be retried

gtest_raft_rpunit

vbotbuildovich · 2024-11-07T15:14:42Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57766#019306d4-ec26-4751-bedd-d29206118d6d
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57805#01930898-8547-453e-8276-95dc4256e9a6
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57805#01930b91-ee13-4f53-9074-e0ac7dd47440
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57805#01930b91-ee17-437e-a840-848ff22db1f4

Lazin · 2024-11-12T10:43:44Z

@nvartolomei Please add some description to the PR and to the commit message. The PR has one commit with 455 lines of code and one line commit message.

This is intended to be used to implement recovery from cloud storage. The detailed design can be found in the shared Redpanda gdrive under the name "Shadow Topics: Recovery RFC". The dl_stm_api provides 3 new methods: - v start_snapshot() - a metadata only operation - snap read_snapshot(v) - the actual snapshot payload generation - remove_snapshots_before(v) The design assumes single user of mvcc snapshots. The cloud topics recovery subsystem. It will create a logical snapshot first, then read and backup the contents to cloud storage. When done, cloud topics recovery subsystem is responsible to clean up older snapshots by calling `remove_snapshots_before(v)`. After a v' = start_snapshot() call it is guaranteed that any cloud storage references contained in the result of read_snapshot(v') will be available in cloud storage and thus recoverable until a call to `remove_snapshots_before(v'') where v'' > v'. We don't have a garbage collection mechanism implemented yet so there is no code related to the property described above. If we don't garbage collect anything the above property is always true. When we add garbage collection, it will be able to respect the property with logic as simple as ``` bool overlay_eligible_for_gc(dl_version v, dl_overlay o) { bool is_removed = o.dl_removed_at <= v; bool referenced = !_snapshots.empty() && o.added_at >= _snapshots.front().version; return is_removed && !referenced; } ```

nvartolomei · 2024-11-12T12:45:21Z

@Lazin updated commit message. PTAL.

Lazin · 2024-11-12T11:02:55Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

@@ -68,4 +70,81 @@ std::optional<dl_overlay> dl_stm_api::lower_bound(kafka::offset offset) const {
    return _stm->_state.lower_bound(offset);
 }

+ss::future<checked<dl_snapshot_id, dl_stm_api_errc>>
+dl_stm_api::start_snapshot() {
+    model::term_id term = _stm->_raft->term();


This is very similar to command_builder that archival_metadata_stm uses. Maybe it makes sense to add get_api method to the dl_stm which would return an instance of this class?

Right now the dl_stm_api is constructed separately but it's also a friend of the dl_stm and is the only way to interact with it. So IMO it makes sense to make this more apparent by allowing dl_stm to construct api objects.

Lazin · 2024-11-12T11:06:33Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

+    auto expected_id = dl_snapshot_id(dl_version(res.value().last_offset));
+
+    auto applied = co_await _stm->wait_no_throw(
+      res.value().last_offset, model::timeout_clock::now() + 30s);


Consider using model::no_timeout, otherwise the command could be applied concurrently with the code that invokes start_snapshot.

Why would that be a problem? PS. there are no locks taken here.

Agree, but imagine a situation when this fails. The caller is retrying the operation and replicating new command, then the original command is applied and then the retried command is applied. I guess that currently there is an implicit assumption that this is OK because the command itself is empty. It doesn't have any state on its own. If this is correct than this should be mentioned in the comment at least. Also, this may not always be the case. Eventually, some new fields could be added to the command and this implicit assumption may no longer be true.

Lazin · 2024-11-12T12:30:36Z

src/v/cloud_topics/dl_stm/dl_stm_api.h

+    /// to avoid the state growing indefinitely.
+    ss::future<checked<void, dl_stm_api_errc>>
+    remove_snapshots_before(dl_version last_version_to_keep);
+


this class needs a gate and a shutdown code path if the object is not transient (only created to replicate one message)

Lazin · 2024-11-12T12:33:09Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

+          fmt::format("Failed to replicate remove snapshots: {}", res.error()));
+    }
+
+    co_await _stm->wait_no_throw(


this workflow (replicate and then wait until the command is applied) can be extracted and reused

Lazin · 2024-11-12T12:38:31Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

+    opts.set_force_flush();
+    auto res = co_await _stm->_raft->replicate(term, std::move(reader), opts);
+
+    if (res.has_error()) {


It probably makes sense to return not_leader error code from here. The dl_stm_api_errc has two values, timeout and not-leader. The user will expect the last one to be returned when the replication fails due to lost leadership. And 99% of the time the res.error() will be equal to raft::errc::not_leader.

Lazin · 2024-11-12T12:41:07Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

+    }
+
+    // Ensure that the expected snapshot was created.
+    vassert(_stm->_state.snapshot_exists(expected_id), "Snapshot not found");


Maybe exception?

Sounds good.

Lazin · 2024-11-12T12:41:57Z

src/v/cloud_topics/dl_stm/dl_stm_api.cc

+    auto res = co_await _stm->_raft->replicate(term, std::move(reader), opts);
+
+    if (res.has_error()) {
+        throw std::runtime_error(


same as previous

Lazin · 2024-11-12T12:55:53Z

src/v/cloud_topics/dl_stm/tests/dl_stm_state_test.cc

+    ASSERT_EQ(snapshot3->overlays[1], overlay2) << snapshot3->overlays;
+}
+
+TEST(dl_stm_state, remove_snapshots_before) {


do we need a test case that tries to remove all snapshots?

Can you share a bit more about what you have in mind? Currently it would be impossible to remove all snapshots unless due to a bug.

if it should be impossible to remove everything than we should probably have a test that tries to remove everything

How if there are no api calls for that and by design you can't have a dl_snapshot_id without creating one first? I feel like you want me to test a specific case but I'm not sure which one.

the case is

state.start_snapshot(); state.start_snapshot(); auto id = state.start_snapshot(); ... state.remove_snapshots_before(id + 1);// should fail

does it make sense?

Lazin · 2024-11-12T12:58:19Z

does dl_stm needs some additional concurrency control mechanism?

nvartolomei · 2024-11-12T14:10:14Z

does dl_stm needs some additional concurrency control mechanism?

do you see a problem somewhere?

Lazin · 2024-11-12T14:16:52Z

does dl_stm needs some additional concurrency control mechanism?

do you see a problem somewhere?

this is a question
are we guaranteed that there is no concurrency? the version check is only checking version itself, it doesn't fence concurrent updates, but maybe it's OK, IDK

vbotbuildovich · 2024-11-12T16:02:34Z

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57951#019320c4-643a-44ad-901c-2bb6e5a76c18:

"rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3"

vbotbuildovich · 2024-11-12T16:02:58Z

Retry command for Build#57951

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_storage_scrubber_test.py::CloudStorageScrubberTest.test_scrubber@{"cloud_storage_type":1}

github-actions bot added area/build area/redpanda labels Oct 30, 2024

nvartolomei requested review from dotnwat and Lazin October 30, 2024 15:26

nvartolomei force-pushed the nv/dl_stm-snapshot branch from 5e9b39f to ec9064e Compare October 30, 2024 15:35

nvartolomei mentioned this pull request Oct 30, 2024

ct: recovery #23963

Draft

7 tasks

nvartolomei commented Oct 30, 2024

View reviewed changes

src/v/cloud_topics/dl_snapshot.h Outdated Show resolved Hide resolved

nvartolomei force-pushed the nv/dl_stm-snapshot branch from ec9064e to 62dc683 Compare November 5, 2024 16:16

nvartolomei force-pushed the nv/dl_stm-snapshot branch from 62dc683 to d69e291 Compare November 7, 2024 11:56

nvartolomei force-pushed the nv/dl_stm-snapshot branch from d69e291 to ef1d9be Compare November 7, 2024 20:26

nvartolomei force-pushed the nv/dl_stm-snapshot branch from ef1d9be to d4e0d28 Compare November 12, 2024 12:44

Lazin reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ct: dl_stm mvcc snapshot #23960

ct: dl_stm mvcc snapshot #23960

nvartolomei commented Oct 30, 2024 •

edited

Loading

vbotbuildovich commented Nov 5, 2024 •

edited

Loading

vbotbuildovich commented Nov 7, 2024 •

edited

Loading

Lazin commented Nov 12, 2024

nvartolomei commented Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

nvartolomei Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

nvartolomei Nov 12, 2024

Lazin Nov 12, 2024

Lazin Nov 12, 2024

nvartolomei Nov 12, 2024

Lazin Nov 12, 2024

nvartolomei Nov 12, 2024 •

edited

Loading

Lazin Nov 12, 2024

Lazin commented Nov 12, 2024

nvartolomei commented Nov 12, 2024

Lazin commented Nov 12, 2024

vbotbuildovich commented Nov 12, 2024

vbotbuildovich commented Nov 12, 2024

ct: dl_stm mvcc snapshot #23960

Are you sure you want to change the base?

ct: dl_stm mvcc snapshot #23960

Conversation

nvartolomei commented Oct 30, 2024 • edited Loading

Backports Required

Release Notes

vbotbuildovich commented Nov 5, 2024 • edited Loading

vbotbuildovich commented Nov 7, 2024 • edited Loading

Lazin commented Nov 12, 2024

nvartolomei commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvartolomei Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lazin commented Nov 12, 2024

nvartolomei commented Nov 12, 2024

Lazin commented Nov 12, 2024

vbotbuildovich commented Nov 12, 2024

vbotbuildovich commented Nov 12, 2024

Retry command for Build#57951

nvartolomei commented Oct 30, 2024 •

edited

Loading

vbotbuildovich commented Nov 5, 2024 •

edited

Loading

vbotbuildovich commented Nov 7, 2024 •

edited

Loading

nvartolomei Nov 12, 2024 •

edited

Loading