reusing temporary search output

tcf03 has asked for the wisdom of the Perl Monks concerning the following question:

I am searching through about 20 log files approx 200 meg each. The following code is the sub whith which I actually do the searching and output the text.

sub search_file
{
    my ($filename) = @_;
    open (TMPFILE, "$filename") or die("unable to open $filename: $!\n
+"),br;
    print h3("$_\n");
    while (<TMPFILE>)
    {
        if ( m/^....($MM1|$MM2|$MM3|$MM4|$MM5|$MM6)/)

        {
            s/($SEARCH|$SEARCH1|$SEARCH2)/<font color=\"#0000ff\"><b>$
+1<\/b><\/font>/o;
            print "$_\n",br;
        }
    }
    close (TMPFILE);
}
[download]

Basically the $MM1 - $MM6 are dates that Im searching for and $SEARCH-$SEARCH2 are Items Im searching for within those dates...

This is working fine. What Im looking to do - since this app takes somewhere between 5 and 10 minutes to finish searching all the files is to save the results and re-search on the data that has already been returned. I thought about initially dumping the search into a text file and doing the sub search ($SEARCH-$SEARCH2) within that. The trouble is there may be several people running this at once, and I dont want to create a lot fo text files on the server that I need to delete later.

How would I go about saving a sessions worth of data for possible more than one user, and forget about the saved session data once the browser session is closed? I just hate having to resubmit new search criteria, and wait several minutes for my new output, when I already have the data sitting in front of me.

Ted
--
"Men have become the tools of their tools."
--Henry David Thoreau

Comment on reusing temporary search output Download Code

Replies are listed 'Best First'.
Re: reusing temporary search output by perrin (Chancellor) on May 02, 2005 at 22:46 UTC
Unless there is something specific to a user's session about the search results, you should not tie them to sessions at all. Use the search criteria to generate a unique file name (usually one would do this by generating a long string containing the criteria in sorted order and then taking an MD5 of it) and store the results there. Write a cron job to kill anything over 1 day old in your search cache directory.	[reply]
Re^2: reusing temporary search output by tcf03 (Deacon) on May 03, 2005 at 00:04 UTC
Thanks This is probably the course I will take. Ted -- "Men have become the tools of their tools." --Henry David Thoreau	[reply]
Re: reusing temporary search output by Fletch (Bishop) on May 02, 2005 at 15:27 UTC
Unless you've got an explicit "logout" action you're never really going to know when the session's over. The simplest thing might be to write an `log_for_date( )` sub which returns a filehandle for a pre-trimmed logfile for the date in question, which you'd then search for the specific terms. That sub is responsible for locating a pre-existing version if possible. You'd then setup a cron job (or `fork` a process after you've returned your results to the user) which leans up any logfiless that haven't been accessed in more than n minutes (or hours). Another approach requiring a bit more complexity would be to build an index of (date, `tell` offset) for each line in the file and use that to jump to points of interest rather than slogging through the entire file every time (in fact this might be a good place to use an iterator, for the HoP-inclined). Or look at the Cache hierarchy on CPAN.	[reply] [d/l] [select]