-
-
Notifications
You must be signed in to change notification settings - Fork 0
Make segment-cache size configurable and use emptyDir for it #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
"on disk" -> is this an externally provided PV? |
## Description Run with `stackablectl --additional-demos-file demos/demos-v1.yaml --additional-stacks-file stacks/stacks-v1.yaml demo install nifi-kafka-druid-water-level-data` Tested demo with 2.500.000.000 records Hi all, here a short summary of the observations of the water-level demo: NiFi uses content-repo pvc but keeps it at ~50% usage => Shoud be fine forever Actions: * Increase content-repo 5->10 gb, better safe than sorry. I was able to crash it by using large queues and stalling processors. Kafka uses pvc (currently 15gb) => Should work fine for ~1 week Actions: * Look into retentions settings (low priority as it should work ~1 week) so that it works forever Druid uses S3 for deep storage (S3 has 15gb). But currently it also cashes *everything* locally at the historical because we set `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"300g"}]` (hardcoded in https://github.com/stackabletech/druid-operator/blob/45525033f5f3f52e0997a9b4d79ebe9090e9e0a0/deploy/config-spec/properties.yaml#L725) This does *not* really effect the demo, as 100.000.000 records (let's call it data of ~1 week) have ~400MB. I think the main problem with the demo is that queries take > 5 minutes to complete and Superset shows timeouts. The historical pod suspiciously uses exactly one core of cpu and the queries are really slow for a "big data" system IMHO. This could be because either druid is only using a single core or because we dont set any resources (yet!) and the node does not have more cores available. Going to reasearch that. Actions: * Created stackabletech/druid-operator#306 * In the meantime configure overwrite in the demo `druid.segmentCache.locations=[{"path"\:"/stackable/var/druid/segment-cache","maxSize"\:"3g","freeSpacePercent":"5.0"}]` * Research slow query performance * Have a look at the queries the Superset Dashboard executes and optimize them * Maybe we should bump the druid-operator versions in the demo (e.g. create release 22.09-druid which basically is 22.09 with a newer druid-op version). Therefore we get stable resources. * Enable Druid auto compaction to reduce number of segments
Nope, it's an emptyDir. Normally it's a spinning disk or ssd on the k8s node. A it's a cache there is no point saving it via a pvc |
Integration test for this failed, so should be investigated some more. |
Maybe run it on AWS EKS 1.22 (nightly runs on that) instead of IONOS 1.24 |
I've unassigned myself, since this will go into a bigger phase of "In Progress" again |
Blocked by: stackabletech/operator-rs#497 |
Part of: #306 This PR has been extracted from #320 which will be closed. The part that was left out is the actual configuration the of segment cache size. That will be implemented in a future PR and will require a new operator-rs release. :green_circle: CI https://ci.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/druid-operator-it-custom/34/ Co-authored-by: Sebastian Bernauer <[email protected]>
# Description This doesn't add or change any functionality. Fixes #335 Required for #306 This is based on #333 and has to be merged after that. :green_circle: CI: https://ci.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/druid-operator-it-custom/39/ ## Review Checklist - [x] Code contains useful comments - [x] CRD change approved (or not applicable) - [x] (Integration-)Test cases added (or not applicable) - [x] Documentation added (or not applicable) - [x] Changelog updated (or not applicable) - [x] Cargo.toml only contains references to git tags (not specific commits or branches) - [x] Helm chart can be installed and deployed operator works (or not applicable) Once the review is done, comment `bors r+` (or `bors merge`) to merge. [Further information](https://bors.tech/documentation/getting-started/#reviewing-pull-requests)
Currently we the segment-cache location and size hardcoded to 300gb:
druid-operator/deploy/config-spec/properties.yaml
Line 725 in 4552503
Also /stackable/var/druid/segment-cache is not mounted but instead belongs to the container root directory.
We could either put the cache on a disk or in a ramdisk (by using Memory medium for emptydir).
My suggestion is putting the it on disk as this matches the Druid docs
So we need a emptyDir without setting a explicit medium (using disk). We should also set the sizeLimit to the cache size.
freeSpacePercent
Druid attribute.CRD proposal
UPDATE: 04.11.12
Change of plan: since the operator framework doesn't support merging
enum
types currently, the solution above cannot be implemented. In agreement with others, a new temporary solution is proposed: an implementation with support foremptyDir
storage will be made in this repository only. Later, when the framework is updated with theenum
merging support, the complete solution from above will be implemented. This proposal is forward compatible with the one above from the user's perspective.The manifest will look just like this (note the missing PVC configuration:
The text was updated successfully, but these errors were encountered: