in reply to Re: Caching files question
in thread Caching files question

It's reasonable to assume that vit would figure this out, but maybe it's worth mentioning that in order for "myOpen" to work, the existing code has to switch to using lexically scoped file handles (if it is not doing so already), and the existing open() calls have to change from whatever syntax has been used so far to exactly this syntax (moving the file handle out of the arg list, and making sure that mode and pathname are separate args):
my $fh = myOpen( "<", $pathname )
(with an added "or die ...", as appropriate)

Also, I was curious why you bother to localize $/, given that you are calling sysread, which doesn't use $/ at all. And when doing sysread, it would be good to check the return value more carefully -- zero means total failure, but any value other than the size of the file would mean a partial failure, which would probably be just as bad:

my %cache; sub myOpen { my( $mode, $path ) = @_; my $fh; if ( not exists $cache{ $path } ) { -s $path or return; # don't do a 0-length or non-existent f +ile open $fh, $mode, $path or return; ( sysread( $fh, $cache{ $path }, -s _ ) == -s _ ) or return; close $fh; } open $fh, $mode, \$cache{ $path } or return; return $fh; }
(updated to include the whole subroutine with a simplified conditional block, added the check for non-zero return from "-s", and removed an unnecessary "$size" variable)

Replies are listed 'Best First'.
Re^3: Caching files question
by vit (Friar) on Aug 17, 2008 at 15:37 UTC
    BrowserUk and graff,
    Thank you very much, I think this is exactly what I need.
    graff,
    please let me know what you mean by "using lexically scoped file handles". You mean out of scope of sub myOpen()?
    What I am doing is I am calling Script3.pl from Script2.pl from Script1.pl and files are opened inside Script3.pl. So does it mean that I have to open files in external Script1.pl?
    Am I right that files will not be cached any more once I stop external script?
      please let me know what you mean by "using lexically scoped file handles".

      I'm just saying that if your code is currently written like this:

      open( FH, "<some/path.name" ); while (<FH>) { ... } close FH;
      It will have to be changed to something like this:
      my $fh = myOpen( "<", "some/path.name" ); while (<$fh>) { ... } close $fh;
      (In other words, you need to use a scalar variable instead of a GLOB as the file handle, and as a rule, scalar variables should be lexically scoped by declaring them with "my".)

      No, you don't have to open the files directly in your Script1.pl; it will be best for Script3.pl to have the %cache hash all to itself, as well as all the file opening and reading. Be sure to use strict; and declare with "my".

      Naturally, when your "main" script finishes and exits, all the ram it used is returned to / freed by the OS, no matter how many script files and data files were loaded by your script while it was running. (If you were referring to something else in your last question, I'm sorry if I misunderstood.)

Re^3: Caching files question
by BrowserUk (Patriarch) on Aug 18, 2008 at 00:02 UTC
    Also, I was curious why you bother to localize $/ ...

    Untested code and two minds about how I would implement it. It also requires that the OP change his existing code to assign the returned filehandle, rather than passing it as a parameter per open. Which I considered a good thing.

    Like you earlier, I'm not really sure what circumstances this would be useful, so I raised the possibility without putting too much effort into trying to make it bullet proof. I wanted the OP to either be sufficiently aware to fix it up himself, or ask.

    I like your re-write++. One additional change I would make is to use a hard coded '<:raw' mode on the real open and the user supplied mode on the ramfile open. As you have it, if he passes a non-read mode things will go wrong. Though, that might be a good thing also...ponder...undecided.

    It might also be worth doing some rudimentary, Is this a huge file? check. I don't like arbitrary limits, but issuing a warning if the file is bigger than say 100MB might be the clue stick to avoiding mysterious failures.

    I also thought some about using one of those modules I never use to canonal...canonica....to ensure the paths are absolute and unique--save loading the same file twice--but with all the convolutions possible on *nix, it would take some serious thought.

    Oh. And I'd definitely use unless( exists $cache{ $path } ) { :)

    All of that said, from the OPs latest description of the application (nested CGI calls), none of this is likely to help, as the cache will get re-built every time the scripts are run.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.