Skip to content

Conversation

@rankaiyx
Copy link

@rankaiyx rankaiyx commented Jun 8, 2025

Pull Request: Resolving Pacman Lock File Issues After Rollback

Problem Description

When using snapper-rollback with Arch Linux's recommended btrfs layout, rolling back to snapshots created by snap-pac causes pacman to stop working. This occurs because pacman's lock mechanism doesn't interact well with snapshot hooks:

Problematic Sequence:

  1. Pacman locks database 🔒
  2. Pre-snapshot hook runs 📸
  3. Post-snapshot hook runs 📸
  4. Pacman unlocks database 🔓

Ideal Sequence:

  1. Pre-snapshot hook 📸
  2. Pacman locks 🔒
  3. Pacman unlocks 🔓
  4. Post-snapshot hook 📸

➜ ~ sudo snapper list
...
19 │ single │ │ Sat 07 Jun 2025 08:04:42 PM │ root │ │ test2 │
20 │ pre │ │ Sat 07 Jun 2025 08:04:51 PM │ root │ number │ pacman -S iperf3 │
21 │ post │ 20 │ Sat 07 Jun 2025 08:04:52 PM │ root │ number │ iperf3 lksctp-tools │
22 │ single │ │ Sat 07 Jun 2025 08:04:58 PM │ root │ │ test3 │
➜ ~ sudo snapper status 19..20
c..... /home/abc/.zsh_history
+..... /var/lib/pacman/db.lck
➜ ~ sudo snapper status 21..22
c..... /home/abc/.zsh_history
-..... /var/lib/pacman/db.lck

When rolling back to snapshots containing the db.lck file, pacman incorrectly assumes the database is still locked, preventing further operations.

Solution

This PR adds an optional feature to automatically remove pacman's lock file after rollback:

  1. New Configuration Option:

    # Optional: Uncomment and set directory to mount new subvolume for post-rollback operations
    # Enables automatic removal of pacman lock file in new snapshots
    # Example: 
    # mountpoint_newsubvol = /mnt/snapper-rollback/btrfs_newsubvol

    Users enable the feature by uncommenting and configuring this option

  2. Automatic Post-Rollback Processing:

    • Mounts newly created subvolume in RW mode to temporary directory
    • Checks for and deletes /var/lib/pacman/db.lck if present
    • Unmounts temporary mount point
    • Full dry-run mode support
  3. Safe Implementation:

    • Error handling: Catches and logs all operation exceptions
    • Resource cleanup: Ensures temporary mount point is always unmounted
    • Graceful degradation: Skips operation if option not configured

Tested btrfs Layout

├── @
├── @snapshots
└── @var_log

(Matches Arch Wiki recommended layout)

User Benefits

  • Fixes pacman unavailability after rollback
  • Maintains snapper-rollback's elegance and efficiency
  • Optional feature doesn't disrupt existing workflows
  • Clear logging provides operation status feedback

This feature seamlessly resolves compatibility issues between snapshot rollback and pacman's lock mechanism while preserving snapper-rollback's simplicity and reliability.

@jrabinow
Copy link
Owner

jrabinow commented Jun 8, 2025

Hey @rankaiyx thank you for the contribution and for the very clear description of both the problem and the solution you came up with.

I previously got a request for this here which I rejected on the grounds of script robustness and simplicity. Since you're the second person to request this exact feature, let's take a closer look.

  1. this script is still meant to be robust through simplicity
  2. this script also runs on other OSes than arch, see spiral-linux based on Debian, it also got some interest from gecko-linux at one point (which is based on openSuse). The script should keep supporting them - which it will given that the feature needs to be explicitly enabled 🙇 👍
  3. rather than adding file-specific hardcoded support in the script, I think it would be better to not include the lock in the snapshot when it's created. It looks like excluding /var is not recommended
  4. better would be if snap-pac could create the snapshot before the lockfile is in place (would require changes to pacman, and those people know what they're doing) or it could delete the lockfile right after creating the snapshot. I would suggest making a feature request or a PR on snap-pac since the root cause lies with the timing of its snapshot creation
  5. my instinct is that the simplest solution here from an engineering perspective is for people with specific needs to write their own wrapper script which calls snapper-rollback, then cleans up whatever is needed, since this may vary across distros. A good place for the wrapper script could be /usr/local/bin
  6. simplicity from an engineering perspective is not always simplicity from an end-user's perspective

I'd therefore like to suggest the following tradeoff:

  • we add a new section to the config file, as follows:
[sanitize]
enabled = false
paths =
    /var/lib/pacman/db.lck
    /var/cache/foo/temp.lock

The entire section is optional. Assuming it is present, the enabled field is optional as well.

  • we add a --sanitize flag to ensure that sanitized files do NOT get deleted by default, unless the enabled field is set to true
  • users can place whatever paths they want to auto-delete from their snapshots into the config file, and the script automatically deletes them post rollback - but only if sanitization is explicitly requested

This means that to auto-delete pacman lockfiles post-rollback:

  • users would put an entry into the config file
  • either the user sets enabled to true in the config file, or they call the script with the --sanitize flag

How does that sound?


# Directory to which your btrfs root is mounted.
mountpoint = /btrfsroot
mountpoint = /mnt/snapper-rollback/btrfs_root
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's switch this back. Everyone has their own setting. I was thinking of moving this to somewhere under $TMPDIR or /tmp if that's unset, but this seems like it will break a fair few configs when people upgrade so it requires a proper migration plan

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just wanted to keep both mount directories in the same folder and keep the root directory clean, but it's not mandatory. Considering compatibility, maybe it really shouldn't be changed.

Comment on lines 168 to 172
try:
mountpoint_newsubvol = config.get("root", "mountpoint_newsubvol")
except configparser.NoOptionError:
mountpoint_newsubvol = None
LOG.info("mountpoint_newsubvol not configured, skipping pacman db.lck removal")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than doing a try/except here, I think it would be better to add a new --sanitize flag. If a failure occurs, the script fails fast and loud

Comment on lines 196 to 203
if mountpoint_newsubvol:
subvol_name = config.get("root", "subvol_main")
mount_and_remove_db_lck(
subvol_name,
mountpoint_newsubvol,
dev,
dry_run=args.dry_run
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should avoid mounting a second time. Once we've rolled back, we can remove the lock file directly from the mounted snapshot - unless you had a specific reason for wanting to mount things separately

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, mountpoint_newsubvol is also no longer needed.

@rankaiyx
Copy link
Author

rankaiyx commented Jun 8, 2025

First, I tried to delete the lck file after taking a snap-pac snapshot by setting the read-only snapshot as writable, deleting the file, and then restoring the snapshot attribute to read-only. However, it broke the strict correspondence between the snapshot metadata created by snapper and the btrfs read-only snapshot, causing some inexplicable problems.

Another idea was to set the directory where the db.lck file resides as a subvolume, but this would cause other files of pacman not to be snapshoted, so it didn't work.

I also tried to set db.lck as a dangling link soft connection, linking to other subvolumes that are not snapshoted, but the presence of dangling soft links would also hinder the operation of pacman.

@rankaiyx
Copy link
Author

rankaiyx commented Jun 8, 2025

Hey @rankaiyx thank you for the contribution and for the very clear description of both the problem and the solution you came up with.

I previously got a request for this here which I rejected on the grounds of script robustness and simplicity. Since you're the second person to request this exact feature, let's take a closer look.

1. this script is still meant to be robust through simplicity

2. this script also runs on other OSes than arch, see [spiral-linux](https://spirallinux.github.io/) based on Debian, it also got some interest from [gecko-linux](https://github.com/jrabinow/snapper-rollback/issues/4) at one point (which is based on openSuse). The script should keep supporting them - which it will given that the feature needs to be explicitly enabled 🙇 👍

3. rather than adding file-specific hardcoded support in the script, I think it would be better to not include the lock in the snapshot when it's created. It looks like excluding `/var` is [not recommended](https://www.reddit.com/r/archlinux/comments/sacp6g/comment/htsyqk3/)

4. better would be if snap-pac could create the snapshot before the lockfile is in place (would require changes to pacman, and those people know what they're doing) or it could delete the lockfile right after creating the snapshot. I would suggest making a feature request or a PR on [snap-pac](https://github.com/wesbarnett/snap-pac) since the root cause lies with the timing of its snapshot creation

5. my instinct is that the simplest solution here from an engineering perspective is for people with specific needs to write their own wrapper script which calls snapper-rollback, then cleans up whatever is needed, since this may vary across distros. A good place for the wrapper script could be `/usr/local/bin`

6. simplicity from an engineering perspective is not always simplicity from an end-user's perspective

I'd therefore like to suggest the following tradeoff:

* we add a new section to the config file, as follows:
[sanitize]
enabled = false
paths =
    /var/lib/pacman/db.lck
    /var/cache/foo/temp.lock

The entire section is optional. Assuming it is present, the enabled field is optional as well.

* we add a `--sanitize` flag to ensure that sanitized files do NOT get deleted by default, unless the `enabled` field is set to true

* users can place whatever paths they want to auto-delete from their snapshots into the config file, and the script automatically deletes them post rollback - but only if sanitization is explicitly requested

This means that to auto-delete pacman lockfiles post-rollback:

* users would put an entry into the config file

* either the user sets `enabled` to true in the config file, or they call the script with the `--sanitize` flag

How does that sound?

These are all good suggestions, I'll try to optimize them later today.

@rankaiyx
Copy link
Author

rankaiyx commented Jun 8, 2025

The code update is complete and has been preliminarily tested. It works as expected.

Looking forward to further review and testing.

Copy link
Owner

@jrabinow jrabinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, I especially want to call out my appreciation for how you're handling errors 🙇

If you could please address the comments and run the black formatter on the code, I'd be happy to merge this in :-)

target_path = target_root / rel_path

if dry_run:
LOG.info(f"[DRY-RUN] Would check and clean: {target_path}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's print out the exact shell commands which we're running an equivalent for. I should be able to --dry-run the script and get the exact commands that need to be run from a shell if the script wasn't available


# If the following files exist in the file system after the rollback, clean them up.
# Use absolute paths and separate multiple files with commas.
[cleanup]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name this [root.sanitize]

The idea here is that right now, the tool only supports rolling back the root partition. Out of scope here - someday this tool will be adapted to handle other partitions as well. However, when that day comes, we will want to ensure that cleaning files up applies only when rolling back the associated partition

Apologies, I should have mentioned this in my initial proposal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[root.cleanup] may be better? in line with Simple English, and good for internationalization.

Copy link
Owner

@jrabinow jrabinow Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup is ok, but I was hoping for something slightly more specific. How about purge and the associated similarly-named flag?
If you prefer cleanup to purge, let's go with your preference

# Use absolute paths and separate multiple files with commas.
[cleanup]
enabled = false
paths = /var/lib/pacman/db.lck, /var/cache/foo/temp.lock
Copy link
Owner

@jrabinow jrabinow Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each file should be on its own line (suppose I have 15 files I want to cleanup -> we want the config to remain legible), and paths should be on a line of its own
Valid file paths include , so we want to ensure we can support that as well (valid file paths also include the newline character but it's far less common)

Copy link
Author

@rankaiyx rankaiyx Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that multi-line key values ​​are not supported, although there are workarounds, but I'm not sure if they are good or bad.New weaknesses may be introduced, such as strict requirements for sequence numbers.
Is there an elegant way to do this?

a workaround way:
[Plugins]
plugin[0] = core
plugin[1] = auth
plugin[2] = storage

config = configparser.ConfigParser()
config.read('config.ini')

plugins = []
i = 0
while True:
    key = f'plugin[{i}]'
    if key in config['Plugins']:
        plugins.append(config['Plugins'][key])
        i += 1
    else:
        break

print(plugins)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, weird, I tested before making this comment:

diff --git a/snapper-rollback.conf b/snapper-rollback.conf
index e2b52f0..4a658cd 100644
--- a/snapper-rollback.conf
+++ b/snapper-rollback.conf
@@ -36,4 +36,6 @@ mountpoint = /btrfsroot
 # Use absolute paths and separate multiple files with commas.
 [cleanup]
 enabled = false
-paths = /var/lib/pacman/db.lck, /var/cache/foo/temp.lock
+paths =
+    /var/lib/pacman/db.lck
+    /var/cache/foo/temp.lock
$ echo "macOS $(sw_vers -productVersion) $(sw_vers -buildVersion) $(uname -m)"
macOS 15.5 24F74 arm64
$ python --version
Python 3.13.4
$ ipython
Python 3.13.4 (main, Jun  7 2025, 00:36:51) [Clang 17.0.0 (clang-1700.0.13.5)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.31.0 -- An enhanced Interactive Python. Type '?' for help.

[ins] In [1]: from configparser import ConfigParser

[ins] In [2]: cfg = ConfigParser()

[ins] In [3]: cfg.read("snapper-rollback.conf")
Out[3]: ['snapper-rollback.conf']

[ins] In [4]: paths = cfg.get("cleanup", "paths")

[ins] In [5]: paths
Out[5]: '\n/var/lib/pacman/db.lck\n/var/cache/foo/temp.lock'

As you can see, I'm using mac at the moment so maybe configparser is implemented differently according to platforms? It seems doubtful though.
I did the same with python3.3 (python3.2 won't build for me) the behavior is identical, so it's not related to python versions.
What does your system do? If it's really not working, I'm afraid we'll have to stick with commas like you initially did, and so be it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3 & archlinux
I know the key point, there must be at least one space before each line of multi-line key value.
I will implement it.

else:
LOG.warning(f"Cleanup skipped: {target_path} is not a file")
except OSError as e:
LOG.error(f"Error cleaning {target_path}: {str(e)}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"Error deleting '{target_path}': {e}"

Variable interpolation is already handled, no need for str(e), using quotes makes it very clear what is the filepath and what isn't, and deleting is more explicit than cleaning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let’s do it.

Comment on lines +164 to +166
if target_path.is_file():
target_path.unlink()
LOG.info(f"Found and removed file: {target_path} ")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • let's do the dry-run check inside here, to ensure we aren't reporting removing target_path when in reality it would be untouched.
  • nit: what if we combined the .exists() and the .is_file() check?
  • bonus: add support for deleting directories? Potentially dangerous (then again what isn't in this context?) -> I'm not sure it's such a good idea to do this one but it would be consistent with principle of least astonishment. Your call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dry-run, the program will not run here because there is no real target subvolume(target_path).
Perhaps we could do this, in a simulation run, to check if there are matching files in the source subvolume, and if so list them.

Copy link
Owner

@jrabinow jrabinow Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. In that case, I think you had the right idea initially. We can print rm -f '${target_path}'. This command won't error out even if the file doesn't exist. We print the rm command regardless of whether the file exists/is a file or not, like you already coded. How's that sound?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm -f '${target_path}'
Well, it's very easy to understand, even though it's not actually the rm command used.
Just display it like this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Easy to understand and and convert into a shell script to run in an environment where the python script doesn't work (python runtime or btrfsutil module not available)

dev,
dry_run=args.dry_run,
)
cleanup_files(config, subvol_main, dry_run=args.dry_run)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest let's call this function only if the user made explicit their intention to cleanup files by one of the following methods:

  • explicitly passed in the --sanitize flag when calling the script on the CLI
  • explicitly enabled the feature in the config file

I also think we should keep contained the logic for whether cleanup gets run or not, e.g. we shouldn't check the flag value in one place and the enabled field value in another

@rankaiyx
Copy link
Author

For the original problem, I had a flash of inspiration and came up with a good solution in the upstream project.
wesbarnett/snap-pac#59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants