I am working on a project which is running under a threaded mod_perl. All hits to the site go through the same script, much like PerlMonks with its index.pl. Now, each hit to this script needs to extract a certain amount of data from a file specified by the 'node' (to use PerlMonks analogy) in order to display a completed page to the browser.

The way I had coded it at first was as follows (I just faked a regex for locating the needed extraction; the real one is quite a bit more complex):

# <SNIP> - handle some pre-processing # $FILE contains a safe path to an existing file on # the filesystem, based on browser input. open( my $fh, '<', $FILE ) or die "open failed: $!"; my $file = do { local $/; <$fh> }; close( $fh ); my ($needed) = $file =~ /\A<!\-\-(.*?)\-\->/s; # now output the document, using the $needed info.

Then it struck me that this will be running under mod_perl and I thought about caching the needed extractions from the files into memory. So I rewrote it something like this:

BEGIN { use vars qw($NEEDED); } # <SNIP> as in first example. unless (exists $NEEDED{$FILE}) { open( my $fh, '<', $FILE ) or die "open failed: $!"; my $file = do { local $/; <$fh> }; close( $fh ); ($NEEDED{$FILE}) = $file =~ /\A<!\-\-(.*?)\-\->/s; } # now output the document, using the $NEEDED{$FILE} info.

The files from which the required information is being extracted from aren't all that large, so it seems to me that slurping the entire file contents on each hit wouldn't be too much of a burden. My basic question is whether such simple file I/O on not-too-large files would add up to many CPU cycles under heavy load. By using the caching, I am also hitting the HDD far less often. Again, is it enough to deserve a caching mechanism? Am I worrying too much by caching the extracted pieces or am I being smart?

NB: I figure that the hash method for caching is good enough in this case as it is a threaded mod_perl, so the caching won't be saved individually for each Apache process (as there is only one). As such, I wonder if perhaps using Cache::SharedMemoryCache so that there won't be an additional burder should it ever be moved to a non-threaded mod_perl.


In reply to Repetitive File I/O vs. Memory Caching by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.