Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: Mechanism for ensuring only one instance of a Perl script can only run?

by hv (Prior)
on Dec 06, 2022 at 05:10 UTC ( [id://11148597]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Mechanism for ensuring only one instance of a Perl script can only run?
in thread Mechanism for ensuring only one instance of a Perl script can only run?

When you do two things, there is a gap between them.

You discover that the file does not exist with -e. Then there is a gap. Then you create the file by writing to it. That is a race condition - if another process created the file in that gap, you just overwrote it. Now both of you think you own the file.

(You also treat the disappearance of the file between finding it exists and trying to open it as a fatal error, but that's actually another race condition.)

It's quite a while since I did something like this, but as far as I remember a way that works looks something like:

  1. If the file exists, you don't own it, give up.
  2. Open the file for append, write your pid followed by a newline, close.
  3. Open the file for read; if the open failed, you didn't own the file, give up. (Whoever did own it probably just unlinked it, so optionally you could go back to step one to try again - but probably better not to.)
  4. Read the first line; if it is not your pid, you don't own the file, give up.
  5. You now own the file. When you're done, unlink the file.

  • Comment on Re^4: Mechanism for ensuring only one instance of a Perl script can only run?

Replies are listed 'Best First'.
Re^5: Mechanism for ensuring only one instance of a Perl script can only run?
by afoken (Chancellor) on Dec 06, 2022 at 19:47 UTC
    5. You now own the file. When you're done, unlink the file.

    Please also delete the file when you crash or are killed, e.g. by SIGKILL.

    Yes, I know that's not possible. That's part of why PID files suck, and that's what a monitoring process like supervise from djb's daemontools or even the stinking systemd fixes. And due to the way Unix systems work, all that the monitoring process needs to do is to wait for SIGCHLD or a new task, e.g. sending a signal to the monitored process. In other words: The monitoring process usually does nothing, it is not in the run queue and perhaps swapped out, so it can't do anything wrong. ;-) The O/S kernel will run the monitoring process when anything needs to be done. This reduces the monitoring process to a few lines of code. supervise.c is less than 300 lines, including full error checking.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Yes, that can in principle cause a problem.

      You can try to mitigate it - if you read the pid and find that it is not your own, you can check if that pid is still a running process, and scream bloody murder if it is not. If it is a running process, in some circumstances you can go further and check if it is a process that could legitimately be locking the pid file, and scream bloody murder if not.

      What you want to do in that situation is always going to depend on your particular context.

      In particular it is going to depend on why you are locking. If you are locking access to a resource because you're going to modify it in a way that leaves it invalid/unstable until you are done, then a failure leaves you problems to resolve other than simply being locked out.

      But over many years, I have found that it is pretty rare that a pid file gets left locked due to a failure. And, just speaking of my own experience, it has been exceedingly rare that a locked pid file actually caused a problem - most commonly, you try to do something and see an error message "can't do that, the pid file is locked". And then you remember that the machine just crashed while you were running a similar command a few minutes ago, so you check for breakage, remove the lock file, and continue with your day.

      Maybe I (got real lucky|had excellent judgement) in using a pid file only in situations where failures tended not to cause major problems. Your particular context will determine what the effect will be - whether it will be immediately visible to a human, how easy it is to check for breakage and fix it.

      But if you can get the basics right, a pid file can be a pretty straightforward mechanism that works well enough to solve a particular problem.

        There is another point that I did not mention: You do not need PID files at all.


        I think the idea for PID files was probably to kill two birds with one stone.

        First, the fact that a file with an agreed name exists at all was used to indicate that a service process is running, and no second instance should start. Having a crashed service and a left-over file was no problem, because there was a competent root user who would simly remove the file and restart the service.

        Second, writing the PID of the service process to that file makes the job a little bit easier for root. Something like kill -HUP `cat /var/run/foobar.pid` could be used to send the SIGHUP signal to the foobar service, no need to run ps first to find the process, no chance to mess up the PID.

        So yes, PID files aren't that stupid. But not using them is smarter.


        In the old times, Unix machines had as much storage, memory and computing power as a modern stupid phone. So root tried to waste as little resources as possible. In this century, your smart phone is probably a Unix machine with way more resources than many Unix systems from the last century. So having a tiny monitorig process per service, probably swapped out, waiting for its child to die or for a new incoming command from a pipe is no longer a resource problem.

        daemontools are the smarter way. It starts with svscanboot, that either runs as process 1 ("init"), or is started by process 1. svscanboot starts and monitors svscan, which runs a supervise process for each service (plus an optional second supervise process if the service has a logging service). That gives you a tree of monitored processes, which can run a lot of services.

        pstree looks like this:

        init─┬─acpid
             ├─6*[agetty]
        ...
             ├─sshd───sshd───sshd───bash───pstree
             ├─svscanboot─┬─readproctitle
             │            └─svscan─┬─supervise───cupsd
             │                     ├─13*[supervise───multilog]
             │                     ├─supervise───vnc-fwd
             │                     ├─supervise───remserial
             │                     ├─supervise───ntpd
             │                     ├─supervise───usb-power-manag───{usb-power-manag}
             │                     ├─10*[supervise]
             │                     ├─supervise───smartd
             │                     ├─supervise───mdadm
             │                     ├─supervise───saned
             │                     ├─supervise───xfs
             │                     ├─supervise───nullmailer-send
             │                     └─supervise───exim
             ├─syslogd
        ...
             ├─ypbind───2*[{ypbind}]
             └─ypserv
        

        (This is from one of my servers.)

        Now, how do control a service, or send signals to it? svc can start and stop a service (-u / -d), start it in single-shot mode (-o), and send all kind of signals to the service process, including SIGTERM and SIGKILL. You can also exit the monitoring process (supervise) once the service process has exited. There is a patch to add a few more signals available on Linux (SIGUSR1, SIGUSR2) to the svc/supervise pair. It boils down to svc -h /service/foobar to send a SIGHUP to the foobar service. No PID file needed, no race conditions.

        To know if a service is running, use svstat:

        >svstat /service/*
        /service/auerswald-remote: up (pid 1353) 19081721 seconds
        /service/cups: up (pid 8760) 16791531 seconds
        /service/exim: up (pid 32159) 2935275 seconds
        /service/mdadm-monitor: up (pid 1335) 19081721 seconds
        /service/ntpd: up (pid 1354) 19081721 seconds
        /service/nullmailer: up (pid 3449) 2932783 seconds
        /service/pg9: down 2790080 seconds
        /service/saned: up (pid 1344) 19081721 seconds
        /service/smartd: up (pid 1334) 19081721 seconds
        /service/urbackup: down 2879317 seconds
        /service/usb-power-manager: up (pid 1358) 19081721 seconds
        /service/vnc-fwd: up (pid 27518) 1121 seconds
        /service/xfs: up (pid 1345) 19081721 seconds
        

        The documentation for the djb tools is very terse, like the code, you won't find a single character that is not absolutely needed, and you are expected to know a lot of Unix. There is a good, gentler guide to getting started with tools from djb, called the djb way. It takes a while to get the established ways of handling problems out of your head, but the djb way uses Unix in a much smarter way. It makes you wonder why it took so long to implement these solutions.


        So, I wrote about systemd. This is how a fileserver in a Linux container running Debian 11 looks like:

        > pstree
        systemd─┬─3*[agetty]
                ├─cron
                ├─dbus-daemon
                ├─lighttpd
                ├─nmbd
                ├─nullmailer-send
                ├─rsyslogd───2*[{rsyslogd}]
                ├─smbd─┬─cleanupd
                │      ├─lpqd
                │      ├─smbd
                │      └─smbd-notifyd
                ├─sshd───sshd───bash───sudo───bash───pstree
                ├─systemd───(sd-pam)
                ├─systemd-journal
                ├─systemd-logind
                ├─systemd-network
                ├─systemd-resolve
                └─vsftpd
        

        It looks pretty much the same as with deamontools, except that ALL monitoring happens in the single systemd running as process 1. Yes, PID files are written. After all, this is a Debian system that tries to avoid nasty surprises for users and admins. But systemd can work without PID files. It can write PID files for legacy systems. You could also do that with daemontools.

        There are tools like systemctl that control services started by systemd, much like svc, but with a lot more overhead.

        Systemd uses dbus for its communicaiton, so the dbus-daemon better does not die. And of course, you don't want process 1 to die. Guess what happens when systemd loses its ability to communicate. You are f...ed. Hope and pray that your filesystem is journalled and mostly clean before pressing the reset button or casting the almost equivalent magic spell: Re: How not to implement updaters.

        Daemontools, combined with a minimal process 1, is a much cleaner, smaller, and robuster approach to solving the problem of managing services. Simply because the daemontools approach does not need complex communication protocols like dbus; instead, they rely on Unix primitives.

        And now, we are really close to getting off-topic.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thank you both, hv and afoken,
      this was very enlightening and it seems there's still a lot to learn for me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148597]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-19 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found