tcf03 has asked for the wisdom of the Perl Monks concerning the following question:

I am searching through about 20 log files approx 200 meg each. The following code is the sub whith which I actually do the searching and output the text.
sub search_file { my ($filename) = @_; open (TMPFILE, "$filename") or die("unable to open $filename: $!\n +"),br; print h3("$_\n"); while (<TMPFILE>) { if ( m/^....($MM1|$MM2|$MM3|$MM4|$MM5|$MM6)/) { s/($SEARCH|$SEARCH1|$SEARCH2)/<font color=\"#0000ff\"><b>$ +1<\/b><\/font>/o; print "$_\n",br; } } close (TMPFILE); }
Basically the $MM1 - $MM6 are dates that Im searching for and $SEARCH-$SEARCH2 are Items Im searching for within those dates...

This is working fine. What Im looking to do - since this app takes somewhere between 5 and 10 minutes to finish searching all the files is to save the results and re-search on the data that has already been returned. I thought about initially dumping the search into a text file and doing the sub search ($SEARCH-$SEARCH2) within that. The trouble is there may be several people running this at once, and I dont want to create a lot fo text files on the server that I need to delete later.

How would I go about saving a sessions worth of data for possible more than one user, and forget about the saved session data once the browser session is closed? I just hate having to resubmit new search criteria, and wait several minutes for my new output, when I already have the data sitting in front of me.

Ted
--
"Men have become the tools of their tools."
  --Henry David Thoreau

Replies are listed 'Best First'.
Re: reusing temporary search output
by perrin (Chancellor) on May 02, 2005 at 22:46 UTC
    Unless there is something specific to a user's session about the search results, you should not tie them to sessions at all. Use the search criteria to generate a unique file name (usually one would do this by generating a long string containing the criteria in sorted order and then taking an MD5 of it) and store the results there. Write a cron job to kill anything over 1 day old in your search cache directory.
      Thanks

      This is probably the course I will take.
      Ted
      --
      "Men have become the tools of their tools."
        --Henry David Thoreau
Re: reusing temporary search output
by Fletch (Bishop) on May 02, 2005 at 15:27 UTC

    Unless you've got an explicit "logout" action you're never really going to know when the session's over. The simplest thing might be to write an log_for_date( ) sub which returns a filehandle for a pre-trimmed logfile for the date in question, which you'd then search for the specific terms. That sub is responsible for locating a pre-existing version if possible. You'd then setup a cron job (or fork a process after you've returned your results to the user) which leans up any logfiless that haven't been accessed in more than n minutes (or hours).

    Another approach requiring a bit more complexity would be to build an index of (date, tell offset) for each line in the file and use that to jump to points of interest rather than slogging through the entire file every time (in fact this might be a good place to use an iterator, for the HoP-inclined).

    Or look at the Cache hierarchy on CPAN.