Re^4: Mechanism for ensuring only one instance of a Perl script can only run?

Replies are listed 'Best First'.
Re^5: Mechanism for ensuring only one instance of a Perl script can only run? by afoken (Chancellor) on Dec 06, 2022 at 19:47 UTC
5. You now own the file. When you're done, unlink the file. Please also delete the file when you crash or are killed, e.g. by SIGKILL. Yes, I know that's not possible. That's part of why PID files suck, and that's what a monitoring process like supervise from djb's daemontools or even the stinking systemd fixes. And due to the way Unix systems work, all that the monitoring process needs to do is to wait for SIGCHLD or a new task, e.g. sending a signal to the monitored process. In other words: The monitoring process usually does nothing, it is not in the run queue and perhaps swapped out, so it can't do anything wrong. ;-) The O/S kernel will run the monitoring process when anything needs to be done. This reduces the monitoring process to a few lines of code. `supervise.c` is less than 300 lines, including full error checking. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]
Re^6: Mechanism for ensuring only one instance of a Perl script can only run? by hv (Prior) on Dec 07, 2022 at 03:06 UTC
Yes, that can in principle cause a problem. You can try to mitigate it - if you read the pid and find that it is not your own, you can check if that pid is still a running process, and scream bloody murder if it is not. If it is a running process, in some circumstances you can go further and check if it is a process that could legitimately be locking the pid file, and scream bloody murder if not. What you want to do in that situation is always going to depend on your particular context. In particular it is going to depend on why you are locking. If you are locking access to a resource because you're going to modify it in a way that leaves it invalid/unstable until you are done, then a failure leaves you problems to resolve other than simply being locked out. But over many years, I have found that it is pretty rare that a pid file gets left locked due to a failure. And, just speaking of my own experience, it has been exceedingly rare that a locked pid file actually caused a problem - most commonly, you try to do something and see an error message "can't do that, the pid file is locked". And then you remember that the machine just crashed while you were running a similar command a few minutes ago, so you check for breakage, remove the lock file, and continue with your day. Maybe I (got real lucky\|had excellent judgement) in using a pid file only in situations where failures tended not to cause major problems. Your particular context will determine what the effect will be - whether it will be immediately visible to a human, how easy it is to check for breakage and fix it. But if you can get the basics right, a pid file can be a pretty straightforward mechanism that works well enough to solve a particular problem.	[reply]
Re^7: Mechanism for ensuring only one instance of a Perl script can only run? by afoken (Chancellor) on Dec 07, 2022 at 19:17 UTC
There is another point that I did not mention: You do not need PID files at all. I think the idea for PID files was probably to kill two birds with one stone. First, the fact that a file with an agreed name exists at all was used to indicate that a service process is running, and no second instance should start. Having a crashed service and a left-over file was no problem, because there was a competent `root` user who would simly remove the file and restart the service. Second, writing the PID of the service process to that file makes the job a little bit easier for `root`. Something like kill -HUP `cat /var/run/foobar.pid` could be used to send the `SIGHUP` signal to the foobar service, no need to run `ps` first to find the process, no chance to mess up the PID. So yes, PID files aren't that stupid. But not using them is smarter. In the old times, Unix machines had as much storage, memory and computing power as a modern stupid phone. So `root` tried to waste as little resources as possible. In this century, your smart phone is probably a Unix machine with way more resources than many Unix systems from the last century. So having a tiny monitorig process per service, probably swapped out, waiting for its child to die or for a new incoming command from a pipe is no longer a resource problem. daemontools are the smarter way. It starts with `svscanboot`, that either runs as process 1 ("init"), or is started by process 1. `svscanboot` starts and monitors `svscan`, which runs a `supervise` process for each service (plus an optional second `supervise` process if the service has a logging service). That gives you a tree of monitored processes, which can run a lot of services. pstree looks like this: init─┬─acpid ├─6[agetty] ... ├─sshd───sshd───sshd───bash───pstree ├─svscanboot─┬─readproctitle │ └─svscan─┬─supervise───cupsd │ ├─13[supervise───multilog] │ ├─supervise───vnc-fwd │ ├─supervise───remserial │ ├─supervise───ntpd │ ├─supervise───usb-power-manag───{usb-power-manag} │ ├─10[supervise] │ ├─supervise───smartd │ ├─supervise───mdadm │ ├─supervise───saned │ ├─supervise───xfs │ ├─supervise───nullmailer-send │ └─supervise───exim ├─syslogd ... ├─ypbind───2[{ypbind}] └─ypserv (This is from one of my servers.) Now, how do control a service, or send signals to it? svc can start and stop a service (-u / -d), start it in single-shot mode (-o), and send all kind of signals to the service process, including SIGTERM and SIGKILL. You can also exit the monitoring process (`supervise`) once the service process has exited. There is a patch to add a few more signals available on Linux (SIGUSR1, SIGUSR2) to the svc/supervise pair. It boils down to `svc -h /service/foobar` to send a SIGHUP to the foobar service. No PID file needed, no race conditions. To know if a service is running, use svstat: >svstat /service/* /service/auerswald-remote: up (pid 1353) 19081721 seconds /service/cups: up (pid 8760) 16791531 seconds /service/exim: up (pid 32159) 2935275 seconds /service/mdadm-monitor: up (pid 1335) 19081721 seconds /service/ntpd: up (pid 1354) 19081721 seconds /service/nullmailer: up (pid 3449) 2932783 seconds /service/pg9: down 2790080 seconds /service/saned: up (pid 1344) 19081721 seconds /service/smartd: up (pid 1334) 19081721 seconds /service/urbackup: down 2879317 seconds /service/usb-power-manager: up (pid 1358) 19081721 seconds /service/vnc-fwd: up (pid 27518) 1121 seconds /service/xfs: up (pid 1345) 19081721 seconds The documentation for the djb tools is very terse, like the code, you won't find a single character that is not absolutely needed, and you are expected to know a lot of Unix. There is a good, gentler guide to getting started with tools from djb, called the djb way. It takes a while to get the established ways of handling problems out of your head, but the djb way uses Unix in a much smarter way. It makes you wonder why it took so long to implement these solutions. So, I wrote about systemd. This is how a fileserver in a Linux container running Debian 11 looks like: > pstree systemd─┬─3[agetty] ├─cron ├─dbus-daemon ├─lighttpd ├─nmbd ├─nullmailer-send ├─rsyslogd───2[{rsyslogd}] ├─smbd─┬─cleanupd │ ├─lpqd │ ├─smbd │ └─smbd-notifyd ├─sshd───sshd───bash───sudo───bash───pstree ├─systemd───(sd-pam) ├─systemd-journal ├─systemd-logind ├─systemd-network ├─systemd-resolve └─vsftpd It looks pretty much the same as with deamontools, except that ALL monitoring happens in the single `systemd` running as process 1. Yes, PID files are written. After all, this is a Debian system that tries to avoid nasty surprises for users and admins. But systemd can work without PID files. It can write PID files for legacy systems. You could also do that with daemontools. There are tools like systemctl that control services started by systemd, much like `svc`, but with a lot more overhead. Systemd uses dbus for its communicaiton, so the dbus-daemon better does not die. And of course, you don't want process 1 to die. Guess what happens when systemd loses its ability to communicate. You are f...ed. Hope and pray that your filesystem is journalled and mostly clean before pressing the reset button or casting the almost equivalent magic spell: Re: How not to implement updaters. Daemontools, combined with a minimal process 1, is a much cleaner, smaller, and robuster approach to solving the problem of managing services. Simply because the daemontools approach does not need complex communication protocols like dbus; instead, they rely on Unix primitives. And now, we are really close to getting off-topic. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^6: Mechanism for ensuring only one instance of a Perl script can only run? by rizzo (Curate) on Dec 07, 2022 at 01:24 UTC
Thank you both, hv and afoken, this was very enlightening and it seems there's still a lot to learn for me.	[reply]


P is for Practical
	PerlMonks