Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

precalculating event dates vs.recalculating them.

by schweini (Friar)
on Jun 22, 2008 at 20:21 UTC ( [id://693407] : perlquestion . print w/replies, xml ) Need Help??

schweini has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, long time, no ask!

I've been tasked with the implementation of a event-based website, a bit like, only that we have to support recurring events, which, as many might know, are a royal PITA to handle.
My plan was to precalculate events, and insert one entry into the DB for each ocurrence of the event, in order to make the retrieval of the data (which will hopefully be very frequent) quick and simple. My partners insist that this is not elegant, and that events should be stored as they were created with the recurrence-information stored a la iCalendar, and that each time we show a calendar or a list of events, we re-calculate whether a given event applies to the current time-span or not. This seems to be the approach that most webcalender-software solutions take, but this seems like a daunting waste of computational resources to me.
We cant be the first ones to ponder this problem, so does anyone have any hints or caveats of either approach (precalculating recurring events vs. storing only the recurrence-information) on hand?


P.S.: does anyone klnow of any accepted feed-format for event-information? I could only find iCalendar, and maybe RSS-with-embedded-xCal information, which sturck me as odd.

Replies are listed 'Best First'.
Re: precalculating event dates vs.recalculating them.
by samtregar (Abbot) on Jun 22, 2008 at 23:36 UTC
    I'd start with a simple implementation with no pre-calculation and test to see how inefficient it is. Get some ballpark numbers for the number of potential users and events-per-user, load up some test data, and give it a try. If it's fast enough, you're done. Don't look back.

    If it's not, then yes, pre-calculate. But be sure to treat it as cache data - ready to be invalidated if anything about an event changes. Finding all the invalidation points may be challenging if you don't have good data-encapsulation. All the more reason to invest now in a clean database abstraction that will give you a good way to add caching like this. If you do your job right none of the client code should change when your events switch from calculated on the fly to pre-calculated.


Re: precalculating event dates vs.recalculating them.
by pc88mxer (Vicar) on Jun 22, 2008 at 20:54 UTC
    I've actually had an opportunity to think about this problem recently. In my case I decided to store the individual events because we would never have that many (< 1000 total events/year), and we wanted the ability to customize the description for each of the recurring events.

    Are your users going to be benign? Do you have to handle events that recur indefinitely or can you put a limit on the number of recurrences? If you can prevent abuse, I would initially go with storing it. Save a "recurrence group id" with each event so you can get at all of the other events in the same sequence.

    It will certainly be the easier route. You can always make it more sophisticated later.

Re: precalculating event dates vs.recalculating them.
by Joost (Canon) on Jun 22, 2008 at 23:28 UTC
    As far as I can tell from this relatively sparse info, you're both going about this in an either complicated or inefficient way. How about this:

    1. Whenever a pageview for a given timespan and event-selection is requested, generate that page, and write it to disc. If the same info is request again, make it as cheap as possible to serve the static page.

    2. Whenever a new event is entered or modified/deleted, remove the written pages that are related to that timespan and selection.

    3. Send out caching headers for an hour or so for each page.

    4. Put a caching proxy in front of your site (definitely worth it, if you've got a fairly serious amount of visitors).

    This should ensure that you can serve quite a lot of views on very cheap hardware and hosting (somewhere between 100 and a couple of hundred euros a month in hosting and a few thousand euros of hardware) easily, assuming you're only adding/editing/deleting a handful of event/day on average.

    edit: 3 and 4 alone will give you a significant edge without while just generating each page at request (because only a small amount of popular requests will actually end up at your webserver)

Re: precalculating event dates vs.recalculating them.
by Pic (Scribe) on Jun 22, 2008 at 20:59 UTC

    I don't have any brilliant insights into your problem, but one problem that strikes me is events that are open-ended in their timespan. Those are (more or less) impossible to pre-insert into your database.

    And one option to simplify the selection of events would be to only allow a limited number of variants of recurrence for events, and then writing a really clever SQL query to select the relevant events. You might also be able to do something nice if you use a weird database schema (some form of denormalised schema perhaps).

Re: precalculating event dates vs.recalculating them.
by apl (Monsignor) on Jun 22, 2008 at 22:31 UTC
    If not storing each recurrence of an event is a requirement, I'd have a crontab job (or equivalent) run each day at local midnight to determine what recurring events were going to take place in the coming day, and to then add them to the DB.

    That is, essentially the same thing you'd have to do each time the calendar is viewd, but only once per day as opposed to each viewing.

Re: precalculating event dates vs.recalculating them.
by roboticus (Chancellor) on Jun 23, 2008 at 16:03 UTC

    Regarding the waste of computational resources: Since you probably care about recurrence only while displaying, why not send the rule to the client, and let some javascript do the recurrence handling? That way, your server totally avoids that particular headache, sending it to the client. That way, only the people wanting to see it pay for it (computationally)....

      Yow. Imagine if Google took this approach!

      Seriously, this is not a good idea. You'll end up with a site that's basically useless on a low-end machine, and your clients will go elsewhere. Yes, we're just the coders here but let's at least spare a moment to think of the clients!



        Actually, I always keep the clients foremost in my mind when coding. But we are probably imagining two different things. I wasn't intending that JavaScript do something computationally intensive. If the browser is showing a week or month calendar, it seems to me that having a bit of javascript interpret a dozen recurrence items ought not be demanding. But having the server do a dozen recurrence item calculations for tens of millions of customers might be prohibitive. I was imagining that few clients would have very many recurring tasks, and that computing the rules wasn't terribly serious. (I've not written it, but I think I could imagine what the code would basically look like.)

        Having said that, I'm not intending to backpedal. I'm just saying that I don't think it would be very demanding at all (but I've been wrong before). I was simply intending to promote "compute it as you need it" rather than "bulk compute in case you need it".

        • I *only rarely* do web stuff, and haven't written a line of JavaScript in over 10 years. So I really don't know if it's that much of a pig or not.
        • I'm not imagining terribly complicated recurrence rules.
        • I'm anticipating very few customers with very many recurrence rules.
        • And I'm *certainly* not advocating that Google treat our machines as a compute-farm! ;^)