in reply to Musings about a database ecology

Clearly the largest one is where the Perl scripts (both CGI and administration) prod the database, asking "Got anything for me to do?"

To me, this implies that the database contains events that need processing. Design considerations aside, I get very scared when a database is used to queue up events. IMHO, much better would be a small long-running process that accepted XML-RPC/SOAP/HTTP/protocol-of-choice connections that were of the lines of:

Then, the database (which is very expensive to talk to) is only used when you need to talk to the data (ie, the documents). Plus, your prioritization can be done by this very light-weight daemon.

The neat thing is that you can put this on the same server as your database / application and barely feel it. A daemon like this has a very small footprint.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Replies are listed 'Best First'.
Re^2: Musings about a database ecology
by perrin (Chancellor) on Dec 20, 2004 at 20:23 UTC
    The trouble is, if you want these events reliably queued, you need to store them in a database of some kind. It doesn't have to be the same one as your other data, and it doesn't even have to be relational, but it needs to be a system that can guarantee atomic operations, and survive a restart. You could write a whole separate daemon and store things yourself, but you'll probably end up writing your own version of a database daemon. I'd suggest just setting up a separate lightweight database, if the main one is already overloaded.
      but it needs to be a system that can guarantee atomic operations, and survive a restart

      I would want such a system, but talexb hasn't indicated if either of those are needed. Plus, you don't need atomicity if you only have one consumer. If the daemon is single-threaded, atomicity is undesired overhead.

      A database of some kind, even if it's just Cache, is definitely needed at some layer. I would definitely avoid the relational overhead if it's just a queue of discrete atoms.

      But, I think that saying "It needs a database" and "It must use a database as the primary queue implementation" are two separate statements. Flushing to a database is definitely important, to make sure that restarts are survivable1. However, I would use a Perl AoH as the queue.

      1. Of course, survivable is a spectrum. I would initially propose flushing to the datastore every minute or so. So, you would have up to 60 seconds of data that could be lost. Depending on the system, this may be ok or it may not be ok. (In some systems, it might even be desirable, depending on your other components.)

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        For me, the bottom line is that storing a queue in a networked database works and scales well, and requires very little development. I wouldn't look at harder, more customized solutions unless I was doing something that really fit poorly into the existing networked databases.
Re^2: Musings about a database ecology
by talexb (Chancellor) on Dec 20, 2004 at 16:44 UTC
      To me, this implies that the database contains events that need processing. Design considerations aside, I get very scared when a database is used to queue up events.

    Quite. One idea to speed things up was to put 'events' into a much smaller table and use triggers on the bigger tables to add things to the small table. That way I no longer scan the big table.

    It's an idea that I hope to explore in the near future.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re^2: Musings about a database ecology
by mpeppler (Vicar) on Dec 20, 2004 at 19:22 UTC
    Shouldn't this event handler use some form of persistant storage?

    Michael

      The key is to reduce the overhead for processing a query. Whether or not the final persistent storage is a RDBMS or not is irrelevant - the primary storage should be a RAM-store of some sort. It could flush to a RDBMS any changes on a regular basis, say between requests. Doesn't matter much.

      The point is that talexb's problem arises out of two issues:

      1. The overhead of RDBMS requests
      2. The need to have a traffic cop

      I feel that my design would provide a lightweight solution that provides for both issues.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.