-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
haumea/zrepl: reduce snapshot count #447
Conversation
@@ -33,23 +33,27 @@ | |||
}; | |||
pruning = { | |||
keep_sender = [ | |||
{ type = "not_replicated"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I expect we need have the regex =
part here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe not. Their examples set doesn't have it:
https://zrepl.github.io/configuration/prune.html#pruning-policies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should drop this line anyway. In case the remote end is down, we probably want to keep pruning the sender to reduce the risk of running out of space. Also it might reduce the time to sync up when the receiver becomes reachable again.
Let me dump a bit about why. The problematic situations that we see have lots of data unique to (some) in-between snapshots, i.e. dropping some of those snapshots (manually) could release lots of space. Consequently:
EDIT:
|
🤔 as for the backup location(s), it feels wasteful to keep every week uniformly for a year. Can you see any reason for it? I'd intuitively again go for some exponentially increasing spacing. I assume we can afford more space than on Haumea itself, so e.g. this slower one? "2x1h"
"2x2h"
"2x4h"
"4x8h"
# At this point the grid spans 2 days (-2h) by 10 snapshots.
# (See note above about 8h -> 24h.)
"2x1d"
"2x2d"
"2x4d"
"2x8d"
"2x16d"
"2x32d"
"2x64d"
"2x128d"
# At this point we keep 26 snapshots spanning 384--512 days (depends on moment),
# with exponentially increasing spacing (almost). Perhaps note the docs that the specified intervals do not overlap. All the intervals are stacked in the specified order and multiplicity, forming a fixed grid. |
"1x2h" | ||
"1x4h" | ||
# "grid" acts weird if an interval isn't a whole-number multiple | ||
# of the previous one, so we jump from 8h to 24h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's worth trying to explain the weirdness. I base it not on actual experience but on definition in their docs – and how such a model behaves then when running continuously.
We repeatedly run out of space on Haumea.
1/100 of defaults seemed excessive. Suspected to cause issues. Changed to 1/10 of defaults.
Snapshots more often should be fine now, as we have a faster connection to receiver, and the churn doesn't seem so significant anymore anyway. With the new remote/receiver, let's do the pruning a bit differently, also exponentially now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Reduce snapshot count. We repeatedly run out of space on Haumea.