-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cst: Use hashes in scrubber for segment-exists checks #22810
cst: Use hashes in scrubber for segment-exists checks #22810
Conversation
28fbfa6
to
661d102
Compare
Hi @abhijat. Should we review the properties for docs now or do you prefer that we review when it's not in draft? Thanks |
@Deflaimun this PR is actually based off #21576 - the properties have been reviewed by the docs team already. Once the base PR is merged and I rebase this PR won't contain any configuration related changes. |
661d102
to
cd0dfba
Compare
/cdt |
/ci-repeat |
3afde3c
to
5cae3b3
Compare
/ci-repeat |
new failures in https://buildkite.com/redpanda/redpanda/builds/53889#0191b29a-3ee8-4ab4-9594-a8bcb095a2df:
new failures in https://buildkite.com/redpanda/redpanda/builds/53889#0191b29a-3eea-4c5b-bcc5-d6276e182346:
new failures in https://buildkite.com/redpanda/redpanda/builds/53889#0191b2c4-a837-449d-8b60-94b6f4fc90fa:
new failures in https://buildkite.com/redpanda/redpanda/builds/53889#0191b2c4-a832-4831-bf55-559e60ac226e:
new failures in https://buildkite.com/redpanda/redpanda/builds/54087#0191c767-9877-4a2a-bc1b-18f9a5851e30:
new failures in https://buildkite.com/redpanda/redpanda/builds/54087#0191c780-f5d1-4485-844a-14fd899b4862:
new failures in https://buildkite.com/redpanda/redpanda/builds/54819#01920f30-2f29-4cf4-aaf9-0aa79c47eaab:
|
/ci-repeat |
8edf665
to
c129b25
Compare
/ci-repeat |
c129b25
to
aafbbbd
Compare
/ci-repeat |
aafbbbd
to
3323f9a
Compare
/ci-repeat |
f11dbe3
to
3e563b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing too drastic, just some nits and test suggestions
@@ -343,6 +343,8 @@ struct anomalies | |||
uint32_t num_discarded_missing_segments{0}; | |||
uint32_t num_discarded_metadata_anomalies{0}; | |||
|
|||
bool segment_existence_checked{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add a comment explaining how this is expected to be used? Once this is set to true, does that mean further scrubs aren't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is informational for the person reviewing scrub results. So if this is true, we know that the segment existence checks were performed as part of that scrub. I added this mostly because as a result of this PR, the scrub may run but not check for segments (because inv. data was missing). This flag lets a user determine if that happened. I will add a comment.
} | ||
|
||
bool existence_query_context::should_check_cloud_storage() const { | ||
if (is_http_force_override_enabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename to force_check_enabled or something? It isn't obvious what HTTP this is referring to in this context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to imply here that HTTP API calls are forced to always happen but it was hard to encode in the name, will try to think of a better name.
if ( | ||
query_ctx.lookup_inventory_data(segment_path) | ||
!= cloud_storage::inventory::lookup_result::exists | ||
&& query_ctx.should_check_cloud_storage()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I found it a bit difficult to map the query_ctx members to these methods. I think it would be easier to follow if we merge these into the same method, like:
bool should_check_cloud_storage(remote_segment_path p) {
if (is_inv_data_available) {
return hashes->exists(p) == missing;
}
if (force_without_inv_data ||
// The regular, non-inventory based scrub checks every segment.
!is_inv_scrub_enabled) {
return true;
}
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this looks simpler. I tried to combine the two methods in latest changes
{xxhash_64(test_path.data(), test_path.size())}, | ||
0) | ||
.get(); | ||
q.load_from_disk().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we ever expect to not load_from_disk()? Wondering if it makes sense for existence_query_context to have a static futurized constructor that loads from disk, some
static ss::future<existence_query_context> existence_query_context::load(bool, ntp) {
existence_query_context q{};
co_await q.load_from_disk();
co_return q;
}
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed. the main issue here is with the design of ntp_hashes
, I originally added a gate in there but now realize it is just a set of hashmaps loaded from disk, it doesn't have a coroutine based api other than the loading, so ideally it doesn't need a gate, it should have been one free function to load the data which would be called with a gate held by the caller.
Then ntp hashes would just be a set of hashmaps. I will try to clean it up in a future PR.
// Writes hashes for a single NTP to a data file. The data file is | ||
// located in path: namespace/topic/partition_id/{seq}. The parent | ||
// directories are created if missing. A new data file is created for each | ||
// flush operation. | ||
ss::future<> flush_ntp_hashes( | ||
std::filesystem::path root, | ||
model::ntp ntp, | ||
fragmented_vector<uint64_t> hashes, | ||
uint64_t file_name); | ||
|
||
ss::future<> | ||
write_hashes_to_file(ss::file& f, fragmented_vector<uint64_t> hashes); | ||
|
||
ss::future<> write_hashes_to_file( | ||
ss::output_stream<char>& stream, fragmented_vector<uint64_t> hashes); | ||
|
||
} // namespace cloud_storage::inventory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these all seem to be in service of the single task of writing hashes. Would it make sense to wrap them in some hash_writer
class that wraps a ss::file?
Or are these separate because they're used individually in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe the write_hashes_to_file() can just be in an anonymous namespace in the cc file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC these were member functions of the inventory consumer which were separate just for readability. When I moved them to independent compilation unit, they are no longer required to be public other than the flush function. I'll make the other two private.
# skip adding the segment, because we still want the data set to be present on disk, so that the | ||
# scrubber loads the data and performs segment checks. The presence of the changed segment name | ||
# causes the NTP hash dir to be created on disk. | ||
# TODO add more assertions once metrics for scrubber run are added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Can we have a version of this test that doesn't change the name?
It'd also be nice to have a test that removes segments, generates the report, and asserts that there are anomalies based on the inventory report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with metrics of the scrub process added it will be easier to make some assertions, right now what happens in the scrubber is a bit of a black box for ducktape, especially asserting why actions were taken by scrubber.
@@ -637,6 +678,10 @@ def test_scrubber(self, cloud_storage_type): | |||
self._produce() | |||
self._assert_no_anomalies() | |||
|
|||
# Add a randomly generated inv. report to the mix. This will have some keys missing, but scrubbing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it doesn't seem to be randomly generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the randomness didn't work with the test. I will update the comment.
1cf6165
to
66874ca
Compare
looks like the refactor broke some tests |
The consumer of the object should be able to query if the hashes have been loaded from disk.
The ntp hashes class mainly contains movable fields except for retry chain node. Since the node is used only to set up the logger in the object, a move-ctor is added which moves everything except for the rtc node. Crucially, the gate of the hashes object is also moved, so we do not stop the moved-from hashes object.
A new boolean field denotes if a segment existence check was performed.
When checking for segments within a manifest, the anomaly detector first tries to load up ntp hashes which could have been placed there by inventory service. Once loaded, this data is used as following: * for each segment path, first check the inv. data * if path exists move on. no operation is consumed * if path missing or collision, make http call * if path still missing, mark the segment as missing in anomalies The hash data set is not checked for manifests as these have to be downloaded for the scrub. If the hash data set was not loaded, then the segment-exists checks are completely skipped. This makes sure that we do not make a lot of HEAD requests if the data set is missing. In this case the segment existence check is skipped. There are two caveats here, if scrub is enabled but inv-based scrub is disabled, we always make HTTP calls, because this config is deemed to be explicit intent that such calls should be made. Second is that a flag to force HTTP calls is provided in the anomaly-detector::run method. If set then HTTP calls are always made even if data set is missing. This is for the benefit of the topic recovery validator, where we may want to always make HTTP calls even if there is no inv. data.
The ntp hashes query object wraps several booleans which decide whether api calls should be made, as well as the hashes object itself. The tests added here exercise the combinations of these booleans derived from the cluster config as well as input arguments.
The hash writer logic is extracted to utils compilation unit. This way scrubber/anomaly detector unit tests can use the same logic to prepare fixtures for testing.
The anomaly detector now expects hashes to be loaded from disk. Helper functions are added to write a set of segment-meta to disk, while being able to skip some of these to introduce anomalies.
Two unit tests are added which assert that inventory data is used from disk, and when segments are missing in the data HTTP calls are made to check for the segments.
A similar change was previously made to S3 client but not reflected for ABS client. The caller can directly pass a binary payload and skip wrapping in bytes() during put-object.
Inventory report is added to bucket during scrubber test. This report fudges the segment names before adding to the report for two purposes: * at some point during the test we remove a segment and expect an anomaly. since we do not know the segment to be removed when generating the report, we have to change the name before adding the key, otherwise the key is found in the report and the bucket is never checked, thus breaking the test. * we cannot skip adding the segment to report altogether, because if we do this, no hash file is generated, and as a safety measure the scrubber does not check for segments to avoid making many HEAD requests. In the current state the addition to the test is simply a sanity check. As more metrics are added this test can be expanded to include more useful assertions.
66874ca
to
532515f
Compare
ducktape failure was #16202 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the cosmetic changes!
When checking for segments within a manifest, the anomaly detector first tries to load up ntp hashes which could have been placed there by inventory service.
Once loaded, this data is used as following:
The hash data set is not checked for manifests as these have to be downloaded for the scrub eventually.
The segment-existence check is performed under these circumstances:
The check is not performed if inventory based scrub is enabled, but data is not available on disk.
Note: The scrub result is still marked as full if segment-existence checks are not performed. The semantics of scrub result (whether it is full or partial) are mainly used to drive the combination of scrub result.
Partial results are merged together while full results replace previous values. In this context, marking the result as partial because segment-existence was not checked (possibly due to missing inventory data) does not make much sense. The scrub is still performed over the full offset range in the bucket, we just turned off one check because of data not being available. Such a result should still replace the previous result.
Backports Required
Release Notes