in reply to Re: RFC: Time Lock Module Idea?
in thread RFC: Time Lock Module Idea?

I think the key difference here is that an event loop is only good inside a single process (possibly with threads). While something external, whether that be filesystem, shared memory, database, whatever, can span processes.

For example, a cron job that runs every 30 seconds to ensure system availability. If something goes down, we want to page the sysadmin. But we don't want to page the sysadmin every 30 seconds, we want to wait for the next page for at least, say, 60 minutes. Simply check for the lock - if it's locked, don't page. If it is unlocked or reset, lock it and send the page. Then your cron job can exit, knowing that the next invocation will remember this.

A database has some additional benefit: you can have multiple machines doing the monitoring with some shared semaphores. Say machine A is monitoring 1, 2, and 3. And machine B is monitoring 4, 5, and 6. When something goes wrong with 2, machine A will send an email describing the problem to the sysadmin, and page her. Two locks will be set: one for email and one for paging. Then something goes wrong with 6, only 10 minutes later. An email gets sent out, but machine B will "notice" that the page was sent within the last 60 minutes and not send out a new page (yet). And, of course, if the sysadmin gets the first email within the 10 minutes, she could hit a switch somewhere to reset the page lock so she would get the new page when something went wrong with 6. (I said it would come in handy ;->) A filesystem would work, too, but locking across NFS or SMB is much more tricky than letting the DB handle it for you.

So I'd have to say that the database idea may not be more complicated than it needs to be - depending on the job. TMTOWTDI - but there's also more than one problem each way may be able to solve ;-)

Replies are listed 'Best First'.
Re^3: RFC: Time Lock Module Idea?
by Anonymous Monk on Mar 28, 2006 at 17:57 UTC
    But we don't want to page the sysadmin every 30 seconds, we want to wait for the next page for at least, say, 60 minutes. Simply check for the lock - if it's locked, don't page. If it is unlocked or reset, lock it and send the page. Then your cron job can exit, knowing that the next invocation will remember this.

    Be careful under which circumstances you reset the pager.

    One of my friends once used a simple: "email me at first occurance of the problem, and later if someone has fixed it". Then he hit an intermittent bug that toggled the "it works now/it's broken now" status every few minutes, and his mailbox was full of worthless messages. I shudder to think how annoying that would have been if rigged to a pager...

    Just a consideration to keep in mind...
    --
    Ytrew