in reply to Re^2: Search function for guestbook history?
in thread Search function for guestbook history?

The following should come close to what you want. It runs a bit more slowly now, a couple of 10ths of a second, but given the small size of your user base, it should not be a problem.

  1. Having slurped the file into a scalar, you can quite easily extract the bit between your begin/end comments (provided your users will never be allowed to embed html in their posts?), using a greedy match and capture brackets:
    ## And extract the comments next unless $contents =~ m[ <!--begin--> (.+) <!--end\s--> ]six;

    Note the need to explicitly include the \s in the end delimiter, because I'm using /x to make things a little easier to read. Also /s to allow . to match newlines and /i for case insensitive (though that may be unnecessary for this part).

  2. Okay. I misunderstood your original examples. See the comments below for some explanation of what is going on and ask questions, for anything you don't understand.

    The best thing you can do is play with this script in conjunction with your data and test out the effect of adjusting the regex. Comment bits out; add bits; change the options used to see the effect it has upon the results you get.

    If you have questions, try and supply a 10 or so line script, (not just post back my code!), that demonstrates the problem you are having.

  3. You'll see that having extracted the block of comments, I then separate them into individual comment blocks in the inner while loop. Once you have that, you can then inspect each one for the presence of the user name, and if found, push that onto an array along with the filename. (You could add html markup as appropriate here!).

    Having built the array of matches, you are then in a position to combine them with the rest of the html or return a "Nothing found" page.

#! perl -slw use strict; use Time::HiRes qw[ time ]; sub slurp { local( $/, @ARGV ) = ( -s( $_[0] ), $_[0] ); <>; } my $start = time; my $user = $ARGV[0] or die 'No user supplied'; my @matches; my $files = 0; ## For each matching file in the Hist directory while( my $file = glob 'hist/*.htm' ) { $files++; ## Slurp the contents my $contents = slurp $file; ## And extract the comments next unless $contents =~ m[ <!--begin--> (.+) <!--end\s--> ]six; my $comments = $1; ## Break out each individual comment while( $comments =~ m[ ( ## Capture \n \s* <hr> ## from the <hr> .+? ## Everything (non-greedy) ) (?= \n \s* <hr> ) ## Up to the next <hr> ]gsix ) { my $comment = $1; ## And save it if it contains the specified user name push @matches, "$file\n$comment" if $comment =~ m[ \n \s* ## On a line, possible leading whitespace \Q$user\E ## The user name [^\n]* <br> ## maybe other (non-newline) stuff <br> \s* \n ## maybe whitespace and newline ]mxi; } } printf "Searched $files files in %g seconds\n", time() - $start; die "No match found for user $user" unless @matches; print "User $user found in files:\n-------\n", join "\n---------\n", @ +matches; __END__ P:\test>522029 Buk Searched 133 files in 0.144615 seconds No match found for user Buk at P:\test\522029.pl line 49, <> line 133. P:\test>522029 Doug Searched 133 files in 0.147709 seconds User Doug found in files: ------- hist/h0511.htm <HR> <b>Comment 1</b><br> Doug &lt;<a href="mailto:hun@tele.com">hun@tele.com</a>&gt;<br> USA - Thu 11/29/2005 - 22:05:51 --------- hist/h0601.htm <HR> <b>Comment 1</b><br> Doug &lt;<a href="mailto:hun@tele.com">hun@tele.com</a>&gt;<br> USA - Thu 01/05/2006 - 22:05:51 P:\test>522029 "J H" Searched 133 files in 0.149665 seconds User J H found in files: ------- hist/h0511.htm <hr> <b>Comment 2</b><br> J H<br> Clearwater, FL USA - Wed 01/04/2006 - 02:05:12 --------- hist/h0601.htm <hr> <b>Comment 2</b><br> J H<br> Clearwater, FL USA - Wed 01/04/2006 - 02:05:12

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Search function for guestbook history?
by JCHallgren (Sexton) on Jan 10, 2006 at 22:09 UTC
    Ok, BrowserUK...I've looked...done some reading... and the part I still don't follow is the simple "slurp" subroutine. Could you/others translate it to English a bit more, please? The rest of the code I believe I can understand, at least for now.

      It is basically a reconstruction (from memory, and incorrectly) of the slurp idiom discribed and discussed in Cheap idioms. I suggest you read that thread as it goes into great detail and discusses all the pros and cons.

      I would also recommend that if you are using the one I (mis)typed in my posts (as a substitute for the one I call from my personal Utils library), that you either replace it with the final concensus form from that thread, or just install File::Slurp and have done with it.

      Update: Indeed, I strongly suggest that you get File::Slurp and substitute it's slurp() routine. It is far faster than the slurp routine in the Cheap Idioms thread.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.