Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admin: add cloud_storage_cache_size check to config_multi_property_validation() #23337

Merged
merged 3 commits into from
Sep 18, 2024

Conversation

WillemKauf
Copy link
Contributor

To prevent the invalid configuration of cloud_storage_cache_size and cloud_storage_cache_size_percent both being equal to 0, add a new check to the admin server validation post patch_cluster_config_handler() call.

The following configurations are hereby invalid:

  1. cloud_storage_cache_size and cloud_storage_cache_size_percent cannot both be 0
  2. cloud_storage_cache_size cannot be 0 while cloud_storage_cache_size_percent is nil (std::nullopt)

Relevant slack thread: https://redpandadata.slack.com/archives/C02BDN76HUK/p1724373318548299

Relevant docs PR: redpanda-data/docs#696 (comment)

Backports to versions v24.1.x and v24.2.x are optional and left to the reviewers to decide if necessary.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add more asserts in test_validate_cloud_storage_cache_size_config

src/v/redpanda/admin/server.cc Show resolved Hide resolved
src/v/redpanda/admin/server.cc Outdated Show resolved Hide resolved
tests/rptest/tests/cluster_config_test.py Outdated Show resolved Hide resolved
@WillemKauf WillemKauf force-pushed the cloud_cache_config_fix branch from 72ed807 to 62570f1 Compare September 17, 2024 13:19
Copy link
Contributor

@jcipar jcipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

src/v/redpanda/admin/server.cc Outdated Show resolved Hide resolved
src/v/redpanda/admin/server.cc Outdated Show resolved Hide resolved
To encapsulate the logic for validating a `cluster` level configuration
for the properties `cloud_storage_cache_size` and `cloud_storage_cache_size_percent`.

The following configurations are determined to be invalid:
   1. `cloud_storage_cache_size` and `cloud_storage_cache_size_percent`
      cannot both be 0
   2. `cloud_storage_cache_size` cannot be 0 while
      `cloud_storage_cache_size_percent` is nil (`std::nullopt`)
To prevent the invalid configuration of `cloud_storage_cache_size`
and `cloud_storage_cache_size_percent` both equal to 0, add a new check
to the `admin` server validation post `patch_cluster_config_handler()` call.

Validation is performed at the admin endpoint level to prevent users from
configuring the cluster settings to one of these invalid states, which would
result in the cloud cache calculating its max size as 0.
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Alter commit messages per requests.
  • Add cloud_storage::cache::validate_cache_config() to encapsulate validation logic in cloud_storage rather than in admin/server.cc.

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/54588#019200b7-1e9a-4b4e-ac1e-8809f15858b2:

"rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.ABS"

andrwng
andrwng previously approved these changes Sep 17, 2024
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I think the number ducktape tests are a bit much. Seems like the kind of think a unit test should cover.

tests/rptest/tests/cluster_config_test.py Outdated Show resolved Hide resolved
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Reduce @cluster(num_nodes=) from 3 to 1 for test_validate_cloud_storage_cache_size_config

@WillemKauf WillemKauf force-pushed the cloud_cache_config_fix branch from f91a706 to 0df8fdd Compare September 17, 2024 21:56
@WillemKauf WillemKauf merged commit d416647 into redpanda-data:dev Sep 18, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-23337-v24.1.x-651 remotes/upstream/v24.1.x
git cherry-pick -x ff73a966aecec459328db7b056ebcb81dfc2ba95 d4674d6974d8bd661819a42ffa7149008ba0c20b 0df8fddd9e16e6a19d4021f488702961206f5b02

Workflow run logs.

Comment on lines +1982 to +1986
using cache_size_pct_type = double;
static constexpr auto epsilon
= std::numeric_limits<cache_size_pct_type>::epsilon();

if ((cache_size_pct < epsilon) && (cloud_storage_cache_size() == 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a commit that adds a comment explaining this. i read the docs for epsilon, but i don't see how it fits into this calculation

Copy link
Contributor Author

@WillemKauf WillemKauf Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For double floating-point comparison to 0- is there a different way you prefer doing this?

I can update the commit history to be clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the commit history to be clear.
this already merged, so i think it'll have to be a comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants