Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

quickest way to access cached data?

by Anonymous Monk
on May 13, 2004 at 14:26 UTC ( [id://353071]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have a quick question about the best way to cache a bunch of data for a program I'm making. The data will be cached is roughly 15 KB per entry (a web-page). I am not sure yet how long it will need to be cached. Maybe one day, maybe one month. There will be at least 1000 entries per day.

What I am wondering, is if it will be faster in the long-run to store the data in MySql table and query it each time I need the data with DBI, or in regular files on the server, and slurp them in each time I need the data. Which takes longer for perl? And what about checking ffor the existence of a cached file? Is it quicker to check for a file with -e or to query the database only to find it empty? What if there are x number of users searching at once?

I have no idea why I think this, but for some reason it seems that MySql would be faster unless I find i have to cache the data longer than I expected and the table just gets huge... in that case would it be faster to store it all in files, or would proper indexing on the tables be fine?

Replies are listed 'Best First'.
Re: quickest way to access cached data?
by Corion (Patriarch) on May 13, 2004 at 14:37 UTC

    In short, there is nothing faster than RAM, so you're best off writing a small C http server that loads all HTML pages into RAM and serves them from there.

    If you don't want to write a small HTTP server yourself, you can use Apache and a ramdisk to serve the files from.

    If that still is not possible, maybe because not enough RAM is available on your machine (which is unlikely as even x86 architecture can easily access 2GB of RAM for storage), you can leave the caching on the file level to the OS and simply serve plain files.

    MySQL starts to get a foot in the door only here possibly, as even MySQL has to do exactly the same things the OS has to do for serving pages. Dynamically creating a page will almost always be slower than piping the data from RAM to the network card and slower than piping the data from disk as well.

    If you think that you need to recreate data more dynamically than nightly in a cron job, you can consider Apache and an ErrorHandler directive to create "missing", that is, uncached pages and weed out "old" pages with find or File::Find every hour.

    A fully dynamic database driven solution will most likely be the slowest solution possible, as it has the drawback of needing to go through the DB and the filesystem on every page served.

    Of course, until we know the exact usage patterns and possibly the page sequences, all of this has no meaning. You need to benchmark all solutions to see whether your actual access patterns favour one of the solutions over another.

    Personally, I like serving static HTML, as it has the fewest security risks and backups, failover and bringing online a new version of the site are all easily done with the standard shell toolset. Site updates can be made atomic by accessing the document root via a symlink, so a site update means simply moving the symlink.

      Thanks, I never even thought about looking to cache it in RAM. It would be interesting to look into (and a first for me).

      Basically, I am caching results of search queries. Once the person sees page 1, it can be cached because it won't change so quickly. The first visit to page 2 of course will have to be dynamic, but then can be cached, etc... so if they navigat backwards it should be from cache.

      People do navigate back quite often as this requires a fair amount of browsing and comparing between pages.

        Have a look at memcached:

        memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

        Ciao, Valerio

Re: quickest way to access cached data?
by eXile (Priest) on May 13, 2004 at 15:22 UTC
    Hi,

    The previously mentioned Cache::Cache seems a good solution to me escpecially if you don't know what and how you want to cache precisely. Cache::Cache has several backends (FileCache, MemoryCache, SharedMemoryCache), and expiry of cached objects can be set globally or on a per-object basis. With these features you can experiment until you find the right solution for your caching. MemoryCache normally is the fastest of these backends, but at the expense of a lot of memory (duh).

Re: quickest way to access cached data?
by perrin (Chancellor) on May 13, 2004 at 15:24 UTC
Re: quickest way to access cached data?
by valdez (Monsignor) on May 13, 2004 at 14:42 UTC

    MySql uses files in its backend, so accessing files directly will always be faster; please note that access perfomance to files may degrade on some file systems in presence of a large amount of files stored in the same directory (see the approach used by Cache::Cache); MySql will help you centralize your cache and make it available to many servers. So now, what is your Perl question? :)

    Ciao, Valerio

Re: quickest way to access cached data?
by ambrus (Abbot) on May 13, 2004 at 17:53 UTC

    A flie system-based solution is good. You just have to take care that there wouldn't be very much files in the same directory, in which case the search would get slow. (You don't have to worry about that if you use a new filesystem like reiserfs, but that has disadvantages too.) The operating system will also cache some of the disk data to the memory, but that's for only a short time.

    A database has more advantages if the records (in this case the web pages) are smaller (it costs less disk space); or if you have to make more complicated operations with them than just finding one given the name, which you can not do with a filesystem.

Re: quickest way to access cached data?
by Ryszard (Priest) on May 14, 2004 at 14:33 UTC
    At the risk of coming in a little late, I'm pretty much doing the exact same thing you want, with the addition of also using a relational backend.

    It goes something like this:

    1. Check the cache for the information
    2. If it doesnt exist in the cache, get it from the database and put it in the cache
    3. If it does, get it from the cache and serve it

    The only thing you have to worry about is tuning the expiry time of your cache for optimal performance. I like the previously mentioned idea of serving everything up from the RAM disk, this will also increase your performance.

    If you've built your site to generate the pages dynamically, it really is only about 6 extra lines of code to cache it all:

    my $cache = Cache::FileCache->new(); my $retcache = get('tvgid'); if (! defined $retcache) { # Build your page # cache content expiry $cache->set('tvgid', $page, "5 minutes"); } else { # Return your page }

    Too easy.. :-)

    You infer from the OP performance is a concern to you, keep in mind there are many, many ways to optimise your code, from regex fiddling to algorithm design to OS tuning to building your own webserver, to application design.

    Make sure you benchmark your code as well as RW response time to make sure you can quantify your "optimisations"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://353071]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-18 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found