Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haumea/zrepl: reduce snapshot count #447

Merged
merged 4 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions build/haumea/postgresql.nix
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@
# benefit from frequent vacuums, so this should
# help. In particular, I'm thinking the jobsets
# pages.
autovacuum_vacuum_scale_factor = 0.002;
autovacuum_analyze_scale_factor = 0.001;
autovacuum_vacuum_scale_factor = 0.02;
autovacuum_analyze_scale_factor = 0.01;

shared_preload_libraries = "pg_stat_statements";
compute_query_id = "on";
Expand Down
74 changes: 54 additions & 20 deletions build/haumea/zrepl.nix
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
filesystems."rpool/safe<" = true;
snapshotting = {
type = "periodic";
interval = "15m";
interval = "30m";
prefix = "zrepl_snap_";
hooks = [ {
# https://zrepl.github.io/master/configuration/snapshotting.html#postgres-checkpoint-hook
Expand All @@ -31,27 +31,52 @@
filesystems."rpool/safe/postgres" = true;
} ];
};

# The current pruning setup is an exponentially growing scheme, at both sides.
pruning = {
keep_sender = [
{ type = "not_replicated"; }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I expect we need have the regex = part here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe not. Their examples set doesn't have it:
https://zrepl.github.io/configuration/prune.html#pruning-policies

Copy link
Member Author

@vcunat vcunat Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should drop this line anyway. In case the remote end is down, we probably want to keep pruning the sender to reduce the risk of running out of space. Also it might reduce the time to sync up when the receiver becomes reachable again.

{
type = "grid";
regex = "^zrepl_snap_.*";
grid = lib.concatStringsSep " | " [
"4x15m"
"24x1h"
"4x1d"
"3x1w"
"1x1h(keep=all)"
"1x1h"
"1x2h"
"1x4h"
# "grid" acts weird if an interval isn't a whole-number multiple
# of the previous one, so we jump from 8h to 24h
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worth trying to explain the weirdness. I base it not on actual experience but on definition in their docs – and how such a model behaves then when running continuously.

"2x8h"
"1x1d"
"1x2d"
"1x4d"
"1x8d"
# At this point we keep ~10 snapshots spanning 8--16 days (depends on moment),
# with exponentially increasing spacing (almost).
];
}
];
keep_receiver = [
{ type = "grid";
regex = "^zrepl_snap_.*";
grid = lib.concatStringsSep " | " [
"96x1h"
"12x4h"
"7x1d"
"52x1w"
"2x1h(keep=all)"
"2x1h"
"2x2h"
"2x4h"
"4x8h"
# At this point the grid spans 2 days by ~13 snapshots.
# (See note above about 8h -> 24h.)
"2x1d"
"2x2d"
"2x4d"
"2x8d"
"2x16d"
"2x32d"
"2x64d"
"2x128d"
# At this point we keep ~29 snapshots spanning 384--512 days (depends on moment),
# with exponentially increasing spacing (almost).
];
}
];
Expand All @@ -71,18 +96,26 @@
};

jobs = [
# XXX: Broken since 2024-01-10?
# (defaultBackupJob // {
# name = "rsyncnet";
# connect = {
# identity_file = "/root/.ssh/id_ed25519";
# type = "ssh+stdinserver";
# host = "zh2543b.rsync.net";
# user = "root";
# port = 22;
# };
# })
# Covers 20240629+
(defaultBackupJob // {
name = "rsyncnet";
connect = {
identity_file = "/root/.ssh/id_ed25519";
type = "ssh+stdinserver";
host = "zh4461b.rsync.net";
user = "root";
port = 22;
};
})
/* rsync.net provides a VM with FreeBSD
- almost nothing is preserved on upgrades except this "data1" zpool
$ scp ./zrepl.yml [email protected]:/usr/local/etc/zrepl/zrepl.yml
# pkg install zrepl
# service zrepl enable
# service zrepl start
*/

/* Covered 2024: 0212 -- 0629
(defaultBackupJob // {
name = "hexa";
connect = {
Expand All @@ -93,6 +126,7 @@
port = 22;
};
})
*/
];
};
};
Expand Down
24 changes: 24 additions & 0 deletions build/haumea/zrepl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# [email protected]:/usr/local/etc/zrepl/zrepl.yml
# zrepl main configuration file.
# For documentation, refer to https://zrepl.github.io/
#
global:
logging:
- type: "stdout"
level: "error"
format: "human"
- type: "syslog"
level: "info"
format: "logfmt"

# mostly from https://blog.lenny.ninja/zrepl-on-rsync-net.html
jobs:
- name: sink
type: sink
serve:
type: stdinserver
client_identities: [ haumea ]
recv:
placeholder:
encryption: off
root_fs: "data1"
Loading