haumea/zrepl: reduce snapshot count #447

vcunat · 2024-06-23T13:07:07Z

Reduce snapshot count. We repeatedly run out of space on Haumea.

vcunat · 2024-06-23T13:08:20Z

build/haumea/zrepl.nix

@@ -33,23 +33,27 @@
      };
      pruning = {
        keep_sender = [
+          { type = "not_replicated"; }


Oops, I expect we need have the regex = part here as well.

Or maybe not. Their examples set doesn't have it:
https://zrepl.github.io/configuration/prune.html#pruning-policies

Maybe we should drop this line anyway. In case the remote end is down, we probably want to keep pruning the sender to reduce the risk of running out of space. Also it might reduce the time to sync up when the receiver becomes reachable again.

vcunat · 2024-06-24T07:24:52Z

Let me dump a bit about why.

The problematic situations that we see have lots of data unique to (some) in-between snapshots, i.e. dropping some of those snapshots (manually) could release lots of space. Consequently:

making/transferring snapshots less often should decrease the total transfer amount and reduce this pressure on the remote
larger spacing between snapshots kept on Haumea should decrease the total space needed there, perhaps even if we didn't significantly decrease the total time span covered by snapshots

EDIT:

interesting note is that those problematic snapshots seem to have also larger total size (i.e. if we didn't have any snapshotting, the disk usage would be larger at those points)

vcunat · 2024-06-25T18:39:07Z

🤔 as for the backup location(s), it feels wasteful to keep every week uniformly for a year. Can you see any reason for it? I'd intuitively again go for some exponentially increasing spacing. I assume we can afford more space than on Haumea itself, so e.g. this slower one?

              "2x1h"
              "2x2h"
              "2x4h"
              "4x8h"
              # At this point the grid spans 2 days (-2h) by 10 snapshots.
              # (See note above about 8h -> 24h.)
              "2x1d"
              "2x2d"
              "2x4d"
              "2x8d"
              "2x16d"
              "2x32d"
              "2x64d"
              "2x128d"
              # At this point we keep 26 snapshots spanning 384--512 days (depends on moment),
              # with exponentially increasing spacing (almost).

Perhaps note the docs that the specified intervals do not overlap. All the intervals are stacked in the specified order and multiplicity, forming a fixed grid.

vcunat · 2024-06-25T18:41:07Z

build/haumea/zrepl.nix

+              "1x2h"
+              "1x4h"
+              # "grid" acts weird if an interval isn't a whole-number multiple
+              # of the previous one, so we jump from 8h to 24h


Not sure if it's worth trying to explain the weirdness. I base it not on actual experience but on definition in their docs – and how such a model behaves then when running continuously.

We repeatedly run out of space on Haumea.

1/100 of defaults seemed excessive. Suspected to cause issues. Changed to 1/10 of defaults.

Snapshots more often should be fine now, as we have a faster connection to receiver, and the churn doesn't seem so significant anymore anyway. With the new remote/receiver, let's do the pruning a bit differently, also exponentially now.

zimbatm

LGTM!

vcunat commented Jun 23, 2024

View reviewed changes

vcunat commented Jun 25, 2024

View reviewed changes

vcunat force-pushed the zrepl branch from a690a91 to 0b75fae Compare July 1, 2024 16:12

vcunat added 4 commits July 11, 2024 17:24

haumea/zrepl WIP: reduce snapshot count

5429e88

We repeatedly run out of space on Haumea.

haumea/postgresql WIP: reduce auto-vacuuming

9eda2fb

1/100 of defaults seemed excessive. Suspected to cause issues. Changed to 1/10 of defaults.

haumea/zrepl: switch to rsync.net again

6dc7d89

vcunat force-pushed the zrepl branch from 0b75fae to d05da81 Compare July 11, 2024 15:28

vcunat marked this pull request as ready for review July 11, 2024 15:41

vcunat requested a review from a team as a code owner July 11, 2024 15:41

zimbatm approved these changes Jul 11, 2024

View reviewed changes

vcunat merged commit f8dc53d into master Jul 22, 2024
2 checks passed

vcunat deleted the zrepl branch July 22, 2024 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

haumea/zrepl: reduce snapshot count #447

haumea/zrepl: reduce snapshot count #447

vcunat commented Jun 23, 2024

vcunat Jun 23, 2024

vcunat Jun 25, 2024

vcunat Jun 27, 2024 •

edited

Loading

vcunat commented Jun 24, 2024 •

edited

Loading

vcunat commented Jun 25, 2024

vcunat Jun 25, 2024

zimbatm left a comment

haumea/zrepl: reduce snapshot count #447

haumea/zrepl: reduce snapshot count #447

Conversation

vcunat commented Jun 23, 2024

vcunat Jun 23, 2024

Choose a reason for hiding this comment

vcunat Jun 25, 2024

Choose a reason for hiding this comment

vcunat Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

vcunat commented Jun 24, 2024 • edited Loading

vcunat commented Jun 25, 2024

vcunat Jun 25, 2024

Choose a reason for hiding this comment

zimbatm left a comment

Choose a reason for hiding this comment

vcunat Jun 27, 2024 •

edited

Loading

vcunat commented Jun 24, 2024 •

edited

Loading