tirwhan has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I need to put records into a storage container, and after a certain interval (configurable, some time between now and several days in the future) the record needs to be taken out and an action performed depending on the data it contains (e.g. send an email, update a database or log a message). The records will be relatively small(a few hundred bytes each) but there may be lots of them (up to hundreds of thousands). Records need to be stored persistently, to survive a server reboot or app failure. Records will be added by several independent processes, so the container needs to be accessible concurrently. Once the record has been acted upon it can be removed. The action does not have to take place at the exact time the interval is up, just after it has passed.

If it weren't for the persistency requirement I could run a network daemon which accepts incoming records, places them into a POE::Queue or a Heap and fires events when they are due. As it is, the best solution I can think of is to put records into an RDBMS, index on a field containing the due date/time, regularly query for select * from table where date >= now() and handle all records that are returned. This does not seem so great because it is possible the next query is issued when the last one isn't done dealing with the returned records (also, it'd be good if I could avoid being dependant on an RDBMS). It would seem ideal if I could just continually pop events off a heap, pausing if the next event in line hasn't reached it's due time. Does anyone know of a persistent heap implementation? I found Cache::File::Heap, but that uses DB_File internally and so will read/write the entire file into/from memory on every tie. Or is there a better solution for what I'm trying to achieve?

Thanks in advance for your time and advice


Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Replies are listed 'Best First'.
Re: Persistent timed events
by terce (Friar) on Nov 03, 2005 at 14:27 UTC
    The replies to this recent node might be of interest to you.
Re: Persistent timed events
by Tanktalus (Canon) on Nov 03, 2005 at 17:29 UTC
    it'd be good if I could avoid being dependant on an RDBMS

    "Premature optimisation"! ;-) Under the principles of YAGNI, you could go with a filesystem approach. But I'm not really a big believer in YAGNI. I'd go with the RDBMS because I've found applications like this to constantly grow with every new PHB in my management chain (if your corporation is large enough to have a chain of management). And RDBMS's are much more flexible - a quick ALTER TABLE, and suddenly you have more metadata that you can store about each task. Auditing requirements? No big deal - just add appropriate metadata, and create a second table where you put "completed" tasks (rather than just deleting them).

    If you go with the filesystem approach, you'll either have a real headache on your hands, or you'll rewrite the whole thing to use a database backend. Either one means you have no flexibility.

    Oh, and I'd reverse that comparison in your select statement - I think you mean "where date <= now()" ;-)

    As for concurrency issues - I think you will have those no matter what solution you go with. For starters, one way to deal with that is to mark the row you're working on somehow. With a filesystem, you could rename the file or lock it. But that may not tell you what process is working on it. With a database, you could put in metadata to tell you what host and process ID is working on the problem. Then you can have a workload manager program run periodically on each host, querying for any workloads that are supposedly working on that host, checking the process ID, and then pinging the worker - if there is no such PID, or that worker is hung, you can kill the worker, and reset the item in the database so the next worker available will try it again.

    Yes, this is called feature creep. But those features do creep, and sometimes it's because you realise that your design didn't take into account certain requirements. The first time a worker hangs, you'll realise you need some watcher to notice this and handle the situation. When things start taking too long, someone will want to add a second machine to help the workload (NFS isn't exactly a good filesystem for handling locking and other atomic operations). What I agree with YAGNI on is that you don't build these all in until you do need it. But you do make your architecture flexible enough to allow you to do what you will eventually need to.

      Thanks a bunch, lots of really valuable advice here. Great stuff about the workload and concurrency management. I think I'm going to go with a filesystem-based approach to begin with, but your node has reminded me to make the system decoupled enough that I can later swap out the queuing stuff for a different system, if necessary (which is a good idea in any case of course). So I agree 100% with your last two sentences.


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re: Persistent timed events
by BrowserUk (Patriarch) on Nov 03, 2005 at 14:18 UTC

    I'd consider using the file system.

    Each record becomes a file, named for the time of its due time + a number* to distinguish time clashes, stored in subdirectories named by date to avoid the "huge directory" problem.

    *Say, an MD5 of the date/time/processid/record contents.

    The actioning server would use atomic rename to remove the next entry from the due queue before reading and actioning it, then deleting it.

    If your action times need not be precise, say 1 minute granularity, then approx.1440 files per directory should make for a reasonable glob-sort-rename-the-top-entry action, and minimise the need to retry with the next-to-top entry should another actioning server have grabbed the top one first (assuming that there might be a need for multiple servers).

    Another method of avoiding server collisions I've used before, is to have each server only glob the file system on a given 5 or 10 second boundary of each minute. Ie. Have up to 12 servers with the first waiting until 5 seconds past each minute before scanning the appropriate directory, the second 10 seconds past the minute and so on.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks, I think I'm being dense here, do you mean create 1440 directories per date and then have the actioning servers pick files out of the due directory until it's empty? Otherwise I'd have tens of thousands of files per date-directory. Or do you mean several records per file?

      Great idea about the time-sharing servers, I'll definitely use that.


      Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

        No. One directory per date, one file per message named for the time (to the minute; hence the 1440) of the action + disambiguator. This was based upon the assumption that your "several hundred thousand" figure would be spread across several days.

        20051103 - 0502.xxx1 0502.xxx2 0503.xxx1 0503.xxx2 0505.xxx1 0506.xxx1 20051104 - ...

        If I was misreading you, and your mean several hundred thousand per day, and daily directories would be too big, then I'd sub-divide them into hours

        20051103 - 00 - 01 .... - 23 - 5901.xxx1 - 5901.xxx2 - 5902.xxx1

        I guess you could go to minutes, if required to keep the final directory sizes small for searching. Having 1440 subdirs in a directory isn't a problem as you are never searching that space, just going through it with a hard-coded path.

        I'd stick to one record per file, to avoid the need for using (cooperative) locking, as it only takes for someone to use a non-cooperating utility at an inappropriate moment to create a dealock situation.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.