Skip to content

Allow reconciliation if single dataset is missing snapshots #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tschettervictor
Copy link
Collaborator

#53

@tschettervictor
Copy link
Collaborator Author

If you do like this, you'll have to adjust the tests.

Copy link
Owner

@aaronhurt aaronhurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm good with this if we can make existing tests pass - meaning it doesn't alter existing expectations and is a non-breaking change. It may be time to move snapshot validation out into it's own function.

@tschettervictor
Copy link
Collaborator Author

tschettervictor commented Apr 17, 2025

@aaronhurt

I've merged the loops.

This is working with the following tests

Run the script a few times.
Create a snapshot of a dataset and zfs send/recv it to another nested location.
Delete the snapshot you just created.
Run with ALLOW_RECONCILIATION=0 and make sure it skips.
Run with ALLOW_RECONCILIATION=1 and make sure it completes.

I had to change the loop to loop through the source snaps as opposed to the destination snaps. This is because the source is the authoritative side of the set.

@tschettervictor
Copy link
Collaborator Author

I'm now soring snaps in reverse as the -r will remove them recursively, causing an error when the deeper one are trying to be destroyed.

I tried to see what you were doing with the tests, but can't seem to figure it out quite.

@tschettervictor
Copy link
Collaborator Author

Been running this now over a couple days, doing cloning, send/recv etc...

Works great.

@tschettervictor
Copy link
Collaborator Author

@aaronhurt What do you think?

It's been running fine for a month now. I'm honestly not even sure if it's due to the nohup command, but I suppose I could pull that out and leave it for a while more to see if the issue comes back up?

@tschettervictor
Copy link
Collaborator Author

Aaaaand it failed again this morning. Looks like it was due to a network hiccup.

I'm not sure what the best approach here would be.

Perhaps a cleanup of sorts? Trap?

But as it stands I might have to search for another solution until I can get a better idea of why it fails, and how to properly solve it.

I'll share the logs later today...

@tschettervictor
Copy link
Collaborator Author

May 25 01:00:00 zfs-replicate.sh[7761]: creating lockfile /tmp/.replicate.snapshot.lock
May 25 01:00:00 zfs-replicate.sh[7761]: checking host cmd=ping -c1 -q -W2 192.168.1.132
May 25 01:00:00 zfs-replicate.sh[7761]: checking dataset cmd=/usr/bin/ssh 192.168.1.132 /sbin/zfs list -H -o name tank
May 25 01:00:00 zfs-replicate.sh[7761]: tank
May 25 01:00:00 zfs-replicate.sh[7761]: checking dataset cmd=/sbin/zfs list -H -o name backup/tank
May 25 01:00:00 zfs-replicate.sh[7761]: backup/tank
May 25 01:00:00 zfs-replicate.sh[7761]: listing snapshots cmd=/usr/bin/ssh 192.168.1.132 /sbin/zfs list -Hr -o name -s creation -t snapshot -d 1 tank
May 25 01:00:00 zfs-replicate.sh[7761]: listing snapshots cmd=/usr/bin/ssh 192.168.1.132 /sbin/zfs list -Hr -o name -s creation -t snapshot tank
May 25 01:00:03 zfs-replicate.sh[7761]: listing snapshots cmd=/sbin/zfs list -Hr -o name -s creation -t snapshot backup/tank
May 25 01:02:30 zfs-replicate.sh[7761]: found old snapshot tank@autorep-05212025_1747810800
May 25 01:02:30 zfs-replicate.sh[7761]: destroying snapshot cmd=/usr/bin/ssh 192.168.1.132 nohup  /sbin/zfs destroy -r tank@autorep-05212025_1747810800 & pid=$!; wait $pid
May 25 01:02:31 zfs-replicate.sh[7761]: creating snapshot cmd=/usr/bin/ssh 192.168.1.132 /sbin/zfs snapshot -r tank@autorep-05252025_1748156400
May 25 01:02:31 zfs-replicate.sh[7761]: creating lockfile /tmp/.replicate.send.lock
May 25 01:02:31 zfs-replicate.sh[7761]: sending snapshot cmd=/usr/bin/ssh 192.168.1.132 /sbin/zfs send -Rs -I tank@autorep-05242025_1748070000 tank@autorep-05252025_1748156400 | /sbin/zfs receive -vFd backup/tank
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/mytest@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/mytest@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media/movies@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media/movies@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media/raw@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/media/raw@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/scripts@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/scripts@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/apps@auto-20250510-000000
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/apps@autorep-05212025_1747810800
May 25 01:02:45 zfs-replicate.sh[7761]: success
May 25 01:02:45 zfs-replicate.sh[7761]: attempting destroy backup/tank/apps/meshcentral@auto-20250510-000000

The full log from when it failed. Trying to run the script again fails due to lock file still present. Running it once more after that fails due to snapshots on the destination being present but not on source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants