in reply to Repetitive File I/O vs. Memory Caching

I'm no expert on this, but I'll speculate. (If you're feeling generous, call it a "thought experiment"... :)

The relative benefit may depend on the pattern of activity on the site. If a lot of clients hit the same page (same needed file) in a pretty short time, the server might be caching the file anyway -- 10 hits in a short time (with few or no intervening hits on other pages) won't be that different from a single hit in terms of HDD activity, and caching the little files in mod_perl doesn't buy you that much.

On the other hand, if you have a broad dispersion of pages being selected somewhat randomly, keeping a cache in mod_perl will tend to cause its memory footprint to grow (rapidly at first, then more gradually), with the upper bound being the total size of all the little files; meanwhile, the recall rate for a given cached page in this pattern is relatively low, and again, caching in mod_perl doesn't buy you that much.

In fact, if there's no "expiration" period for the cached files, you're likely going to end up with a lot of the cached page info being swapped out to virtual memory -- so when someone hits one of these pages, the server still needs to do disk i/o to serve the info, only now it's for the sake of memory swapping, rather than reading a small data file.

There may be a scenario where the sort of caching you're suggesting could really be a boost for you, and it may be a realistic one for you, but personally, I'd opt for the "extra" overhead of reading small files, just to keep the whole thing simpler overall.

To really speed things up, considering that the amount of data being fetched per page is fairly small, it would make more sense to store it all in a single MySQL or Postgres database; these things are built for speed (it's hard to improve on their approach to optimizing disk i/o), and mod_perl is built to take maximum advantage of the benefits they provide.

  • Comment on Re: Repetitive File I/O vs. Memory Caching

Replies are listed 'Best First'.
Re^2: Repetitive File I/O vs. Memory Caching
by Anonymous Monk on Mar 28, 2004 at 05:59 UTC

    I think I'm agreeing with most of what you've said... it makes sense to me anyhow. The only part I don't like is the storing of these files in a database. Yes I could do it, but I always hear people who do such things grumbling later on due to editting quirks. These files will be editted once in a while and it is so much easier to open a file in your favorite editor and make changes than to update a database table. On the other hand, just for fun, I think I am going to create a simple script that will be a database editor... sounds like fun :)

      You are quite right about the "grumbling" -- putting any sort of free-form text content into a database will tend to create a barrier for people who need to maintain and update that content. If there isn't a simple procedure in place to do that, it's a killer.

      Even when there is a "simple" procedure in place, the problem can be that it's the only procedure available. Editing text files and storing/updating them on disk really has become analogous to writing on paper: any number of utensils can be used, from the pencil stub invariably found on the floor to the $250 Cartier Fountain Pen. But the typical approach to maintaining text fields in a database is more like the old days of Ma Bell: this is the telephone that you get, it's black, you don't actually own it, and there's nothing you can do to change how it works.

      Maybe a better approach would be to perfect a system for maintaining the database by "importing" from all these little files -- let the files be updated by whatever means are considered suitable, then just fold the new version into the database by some simple process, about which the content authors are blissfully ignorant.

      There's nothing preventing you from creating a small script that replaces a current page stored in a database with a newly edited version. That seems so trivial that it shouldn't be a factor discouraging you from using a database. All the "big" sites can't be completely off track.


      Dave