Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cst: Use hashes in scrubber for segment-exists checks #22810

Merged

Commits on Sep 21, 2024

  1. cst/inv: Expose loaded field in ntp hashes

    The consumer of the object should be able to query if the hashes have
    been loaded from disk.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    52a19a5 View commit details
    Browse the repository at this point in the history
  2. cst/inv: Enable moving NTP hashes

    The ntp hashes class mainly contains movable fields except for retry
    chain node. Since the node is used only to set up the logger in the
    object, a move-ctor is added which moves everything except for the rtc
    node. Crucially, the gate of the hashes object is also moved, so we do
    not stop the moved-from hashes object.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    a1ccf75 View commit details
    Browse the repository at this point in the history
  3. cst: Add field to anomalies result for segment-exists check

    A new boolean field denotes if a segment existence check was performed.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    9e01ede View commit details
    Browse the repository at this point in the history
  4. cst: Use ntp hashes in scrubber/anomaly detector

    When checking for segments within a manifest, the anomaly detector first
    tries to load up ntp hashes which could have been placed there by
    inventory service.
    
    Once loaded, this data is used as following:
    
    * for each segment path, first check the inv. data
    * if path exists move on. no operation is consumed
    * if path missing or collision, make http call
    * if path still missing, mark the segment as missing in anomalies
    
    The hash data set is not checked for manifests as these have to be
    downloaded for the scrub.
    
    If the hash data set was not loaded, then the segment-exists checks are
    completely skipped. This makes sure that we do not make a lot of HEAD
    requests if the data set is missing. In this case the segment existence
    check is skipped.
    
    There are two caveats here, if scrub is enabled but inv-based scrub is
    disabled, we always make HTTP calls, because this config is deemed to be
    explicit intent that such calls should be made. Second is that a flag to
    force HTTP calls is provided in the anomaly-detector::run method. If set
    then HTTP calls are always made even if data set is missing. This is for
    the benefit of the topic recovery validator, where we may want to always
    make HTTP calls even if there is no inv. data.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    da12bd0 View commit details
    Browse the repository at this point in the history
  5. cst: Unit tests for ntp hashes query object

    The ntp hashes query object wraps several booleans which decide whether
    api calls should be made, as well as the hashes object itself.
    
    The tests added here exercise the combinations of these booleans derived
    from the cluster config as well as input arguments.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    e38f323 View commit details
    Browse the repository at this point in the history
  6. cst/inv: Extract hash writer fns to utils

    The hash writer logic is extracted to utils compilation unit. This way
    scrubber/anomaly detector unit tests can use the same logic to prepare
    fixtures for testing.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    1ce4293 View commit details
    Browse the repository at this point in the history
  7. cst: Add tooling to write hashes during unit tests

    The anomaly detector now expects hashes to be loaded from disk. Helper
    functions are added to write a set of segment-meta to disk, while being
    able to skip some of these to introduce anomalies.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    fd95e28 View commit details
    Browse the repository at this point in the history
  8. cst: Add tests for anomaly detection with inv. data

    Two unit tests are added which assert that inventory data is used from
    disk, and when segments are missing in the data HTTP calls are made to
    check for the segments.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    cee6cef View commit details
    Browse the repository at this point in the history
  9. cst: Remove unused imports

    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    7f48c6f View commit details
    Browse the repository at this point in the history
  10. ducktape: Allow ABS client to accept binary payload

    A similar change was previously made to S3 client but not reflected for
    ABS client. The caller can directly pass a binary payload and skip
    wrapping in bytes() during put-object.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    44b24fa View commit details
    Browse the repository at this point in the history
  11. ducktape: Add inv. report parsing to scrubber test

    Inventory report is added to bucket during scrubber test. This report
    fudges the segment names before adding to the report for two purposes:
    
    * at some point during the test we remove a segment and expect an
      anomaly. since we do not know the segment to be removed when
      generating the report, we have to change the name before adding the
      key, otherwise the key is found in the report and the bucket is never
      checked, thus breaking the test.
    * we cannot skip adding the segment to report altogether, because if we
      do this, no hash file is generated, and as a safety measure the
      scrubber does not check for segments to avoid making many HEAD
      requests.
    
    In the current state the addition to the test is simply a sanity check.
    As more metrics are added this test can be expanded to include more
    useful assertions.
    abhijat committed Sep 21, 2024
    Configuration menu
    Copy the full SHA
    532515f View commit details
    Browse the repository at this point in the history