Re^4: 'flock' with multiple users

Funnily enough, I found this thread on supersearch, to make sure I know what 'use' statements I need for flock, precisely because I don't want to use the database. I will keep looking for what I came for after adding my tuppence:

The scenario is that I am building an HA postgres cluster with three nodes plus DR nodes. Postgres can be configured to replicate but robust HA needs extra work (in Perl seemed best) to check the status of all nodes in the cluster and if the current node has the wrong role, to either promote it to master or demote it to standby.

But you don't want postgres to start at machine bootup in this case because you want to check the correct role first. So obviously flock rather than DB is essential - why lock at all? Because otherwise if the master goes down, all the standbys will try to assume master. So the failover program (guess what, I called it failover.pl) has to lock a file (there's a shared volume used for backup, all the nodes have access to) before running its cycle. Then if it detects no masters and promotes its node to master, the failover running on any other node, will be locked out until the count of masters goes from 0 to 1 and so the possibility of two nodes seeing no master and simultaneously promoting two masters is avoided. But using the database is not feasible in this scenario because failover.pl only does anything when the postgres master node is down.

Comment on Re^4: 'flock' with multiple users

Replies are listed 'Best First'.
Re^5: 'flock' with multiple users by Corion (Patriarch) on Jul 30, 2020 at 13:28 UTC
Depending on the share mounting mechanism of your file system, this introduces another point where things can break. One of the more interesting problems back in the day were stale NFS mounts where file access did not work but also did not fail and timeouts were set up far too long. Make sure to at least test that you're not introducing another failure mode, or if you're introducing that, that it is well enough understood.	[reply]

Replies are listed 'Best First'.

Re^5: 'flock' with multiple users
by Corion (Patriarch) on Jul 30, 2020 at 13:28 UTC

Depending on the share mounting mechanism of your file system, this introduces another point where things can break. One of the more interesting problems back in the day were stale NFS mounts where file access did not work but also did not fail and timeouts were set up far too long. Make sure to at least test that you're not introducing another failure mode, or if you're introducing that, that it is well enough understood.

[reply]