Skip to content

Handle pidfiles more correctly#129

Open
bonktree wants to merge 1 commit intoapalrd:mainfrom
bonktree:pr-lock-pidfile
Open

Handle pidfiles more correctly#129
bonktree wants to merge 1 commit intoapalrd:mainfrom
bonktree:pr-lock-pidfile

Conversation

@bonktree
Copy link
Contributor

@bonktree bonktree commented Aug 8, 2025

The whole point of a pidfile, in the absence of a service supervisor, is for a starting process in a non-interactive job (service) to detect other instances of that job (service), although not very robustly due to toctou, pid rollovers. If, at tayga start, the pidfile exists and
contains a PID of an existing process, that is a sign we should bail out.

Take care to remove the pidfile on exit.
Use the pidfile to determine already-running and concurrently-launched
tayga processes that point to the same pidfile.
Lock the pidfile while writing our PID to it.

@bonktree
Copy link
Contributor Author

bonktree commented Aug 8, 2025

The first commit here is unrelated to the purpose of this PR, it's just so my editor (as well as other present and future contributors) does not try to fix trailing whitespace as part of other commits, cluttering the diffs.
@apalrd Andrew, if/when you are up to merge this, please apply that commit separately so it can be ignored in git blame output and added to .git-blame-ignore-revs. Alternatively, you can just run the two commands yourself :) and commit the result separately.

git ls-files -z -c '*.[ch]' | xargs -0 -n1 sed -Ei 's/[[:space:]]+$//'
git ls-files -z -c '*.[ch]' | xargs -0 -n1 sh -c 'export LC_ALL=C; tail -n1 "$1" | wc -lc | awk "{ exit(\$2 > 0 && \$1 == 0); }" || sed -i "\$s/\$/\\n/" "$1"' --

That way, the important diffs would be easier to read for everyone. In particular, distro packagers try to read the changes from release to release before uploading the updated package.

@Skyb0rg007
Copy link
Contributor

I'm not sure if tayga should exit if the pidfile exists with a valid pid. If tayga crashes (and thus doesn't delete the pidfile), then the service manager might restart tayga and get into a crash-loop. For reference, freeradius and nginx each write the pidfile unconditionally.

@bonktree
Copy link
Contributor Author

I'm not sure if tayga should exit if the pidfile exists with a valid pid. If tayga crashes (and thus doesn't delete the pidfile), then the service manager might restart tayga and get into a crash-loop.

That would be unrealistic as far as I understand. Service managers do auto-restart services, but don't need pidfiles. Non-managing launchers do use pidfiles, but can not auto-restart services.

So here's my understanding.
A service manager (aka supervisor) does not need to rely on pidfiles at all. It does not load PID numbers written by other programs from anywhere, it keeps an eye on the actual launched processes, likely its descendants, via system APIs like wait/waitpid/waitid or pidfd or cgroup, and gets race-free, lossless notifications about state changes.
Such a manager does not pass CLI arguments to instruct the child program to use a pidfile. Whether it might dump/reload its internal state in some form to/from a filesystem location is an implementation detail.

A classic init script is not a process supervisor, its job is to fork away & exit (and trust that the daemon does everything else correctly; many do not). It manages nothing.
It does pass a pre-determined pidfile path to the child process, so multiple instances of the init script launch processes that refer to the same one, and maybe different init scripts launch processes that refer to distinct ones, but that's about it. That's why, if we crash and don't delete the pidfile, we will not be restarted; there is no one to restart us or to be notified that we crashed.

Among web backend developers of yore, a great selling point of process supervisors (even before it became popular in Linux-land to run the entire system under one) was that they had the capability to auto-restart their website backends, unlike the init system, in hope that the custom backend program does not immediately crash again, so service downtime would be reduced or feel non-existent.

For reference, freeradius and nginx each write the pidfile unconditionally.

Interesting! They might be relying on something else to kill whatever PID is in the pidfile already, though. nginx, for example, ships with a nginx -s quit interface.

@Skyb0rg007
Copy link
Contributor

A classic init script is not a process supervisor, its job is to fork away & exit. It manages nothing.

I think this is an argument for Tayga to just overwrite the pidfile, since the only time it's started is when the system boots or the administrator explicitly calls the service tayga start equivalent. In both of those cases the system "knows what it's doing", and so the stale pidfile shouldn't be a roadblock. If clobbering is an issue the check can be implemented within the init script itself -- if [ -e /run/tayga.pid ]; then exit 1; fi is not a difficult check to add if desired, so I don't think there should be a change in behavior here. It would break the OPNsense plugin for example, since there is no way to delete the pidfile from the GUI.

@apalrd
Copy link
Owner

apalrd commented Aug 31, 2025

Fixed whitespace separately in #134

The whole point of a pidfile, in the absence of a service supervisor, is
for a starting process in a non-interactive job (service) to detect other
instances of that job (service), although not very robustly due to
toctou, pid rollovers. If, at tayga start, the pidfile exists and
contains a PID of an existing process, that is a sign we should bail
out.

Take care to remove the pidfile on exit.
Use the pidfile to determine already-running and concurrently-launched
tayga processes that point to the same pidfile.
Lock the pidfile while writing our PID to it.
@bonktree
Copy link
Contributor Author

bonktree commented Sep 1, 2025

Fixed whitespace separately in #134

Thank you! Rebasing on top of main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants