in reply to Predictive HTTP caching in Perl

I think you need to work on your specification more. You said something about averaging the last two weeks of downloads and putting the prefetch time 30 minutes before 80% of them. Why? You haven't specified. Why not just do it before all of those times? For that matter, why not prefetch at midnight the night before? There's presumably some constraint, like you need the latest content possible. If so, then you need to specify the maximum tolerable oldness of the content. Or the maximum average oldness. Then you need to specify whether you care if the user sometimes fetches un-prefetched content, or what percent of the time that's allowed to happen. The way you've presented it here, it seems to me that simply caching the first download would be sufficient, or as others suggested using a normal caching proxy like squid.

I think after you've really specified the problem, the solution will probably fall out naturally. It seems to me, however, that you're less concerned about solving a problem than trying to find a problem. If so, then maybe studying up on math and AI really is what you want to do.

Replies are listed 'Best First'.
Re^2: Predictive HTTP caching in Perl
by ryantate (Friar) on May 03, 2006 at 14:33 UTC
    Good questions.

    Why not do it at midnight? Freshness. Note the part where I say "late enough that the results are less than 45 minutes old 15 times, so the cache is really fresh."

    Many of the sources I read update during the night. Think of a page of online newspaper links, typically updated around 3 am in whatever time zone the newspaper is located. But I'm also mixing in blog feeds, updated on a less predictable schedule. So the goal is to cache as soon as possible before a likely visit.

    Simply caching -- with a conventional ttl scheme like you describe -- is, as I explained in my post, not going to cut it. Note I am not dealing with images or other static content that could live happily in such a cache -- or links to other pages, some of which maybe static -- only the text of the Web pages, most of which, again, change every single day.

    I appreciate your reply.