-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cst: Use hashes in scrubber for segment-exists checks #22810
cst: Use hashes in scrubber for segment-exists checks #22810
Commits on Sep 21, 2024
-
cst/inv: Expose loaded field in ntp hashes
The consumer of the object should be able to query if the hashes have been loaded from disk.
Configuration menu - View commit details
-
Copy full SHA for 52a19a5 - Browse repository at this point
Copy the full SHA 52a19a5View commit details -
cst/inv: Enable moving NTP hashes
The ntp hashes class mainly contains movable fields except for retry chain node. Since the node is used only to set up the logger in the object, a move-ctor is added which moves everything except for the rtc node. Crucially, the gate of the hashes object is also moved, so we do not stop the moved-from hashes object.
Configuration menu - View commit details
-
Copy full SHA for a1ccf75 - Browse repository at this point
Copy the full SHA a1ccf75View commit details -
cst: Add field to anomalies result for segment-exists check
A new boolean field denotes if a segment existence check was performed.
Configuration menu - View commit details
-
Copy full SHA for 9e01ede - Browse repository at this point
Copy the full SHA 9e01edeView commit details -
cst: Use ntp hashes in scrubber/anomaly detector
When checking for segments within a manifest, the anomaly detector first tries to load up ntp hashes which could have been placed there by inventory service. Once loaded, this data is used as following: * for each segment path, first check the inv. data * if path exists move on. no operation is consumed * if path missing or collision, make http call * if path still missing, mark the segment as missing in anomalies The hash data set is not checked for manifests as these have to be downloaded for the scrub. If the hash data set was not loaded, then the segment-exists checks are completely skipped. This makes sure that we do not make a lot of HEAD requests if the data set is missing. In this case the segment existence check is skipped. There are two caveats here, if scrub is enabled but inv-based scrub is disabled, we always make HTTP calls, because this config is deemed to be explicit intent that such calls should be made. Second is that a flag to force HTTP calls is provided in the anomaly-detector::run method. If set then HTTP calls are always made even if data set is missing. This is for the benefit of the topic recovery validator, where we may want to always make HTTP calls even if there is no inv. data.
Configuration menu - View commit details
-
Copy full SHA for da12bd0 - Browse repository at this point
Copy the full SHA da12bd0View commit details -
cst: Unit tests for ntp hashes query object
The ntp hashes query object wraps several booleans which decide whether api calls should be made, as well as the hashes object itself. The tests added here exercise the combinations of these booleans derived from the cluster config as well as input arguments.
Configuration menu - View commit details
-
Copy full SHA for e38f323 - Browse repository at this point
Copy the full SHA e38f323View commit details -
cst/inv: Extract hash writer fns to utils
The hash writer logic is extracted to utils compilation unit. This way scrubber/anomaly detector unit tests can use the same logic to prepare fixtures for testing.
Configuration menu - View commit details
-
Copy full SHA for 1ce4293 - Browse repository at this point
Copy the full SHA 1ce4293View commit details -
cst: Add tooling to write hashes during unit tests
The anomaly detector now expects hashes to be loaded from disk. Helper functions are added to write a set of segment-meta to disk, while being able to skip some of these to introduce anomalies.
Configuration menu - View commit details
-
Copy full SHA for fd95e28 - Browse repository at this point
Copy the full SHA fd95e28View commit details -
cst: Add tests for anomaly detection with inv. data
Two unit tests are added which assert that inventory data is used from disk, and when segments are missing in the data HTTP calls are made to check for the segments.
Configuration menu - View commit details
-
Copy full SHA for cee6cef - Browse repository at this point
Copy the full SHA cee6cefView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f48c6f - Browse repository at this point
Copy the full SHA 7f48c6fView commit details -
ducktape: Allow ABS client to accept binary payload
A similar change was previously made to S3 client but not reflected for ABS client. The caller can directly pass a binary payload and skip wrapping in bytes() during put-object.
Configuration menu - View commit details
-
Copy full SHA for 44b24fa - Browse repository at this point
Copy the full SHA 44b24faView commit details -
ducktape: Add inv. report parsing to scrubber test
Inventory report is added to bucket during scrubber test. This report fudges the segment names before adding to the report for two purposes: * at some point during the test we remove a segment and expect an anomaly. since we do not know the segment to be removed when generating the report, we have to change the name before adding the key, otherwise the key is found in the report and the bucket is never checked, thus breaking the test. * we cannot skip adding the segment to report altogether, because if we do this, no hash file is generated, and as a safety measure the scrubber does not check for segments to avoid making many HEAD requests. In the current state the addition to the test is simply a sanity check. As more metrics are added this test can be expanded to include more useful assertions.
Configuration menu - View commit details
-
Copy full SHA for 532515f - Browse repository at this point
Copy the full SHA 532515fView commit details