in reply to Re: Pattern Matching With Regular Expressions
in thread Pattern Matching With Regular Expressions

Okay, I guess the KWIC thing, the way I originally tried to explain it, is a little off-track for you. Still, if you're goal is something like: then you should consider storing all the words of the file (in order of occurrence) into a single array, adding highlights to the array elements that happen to match the search terms, and when that's done, go through the array to print out the regions that contain one or more highlighted terms, so the result looks like:
... this is a sequence that has target1 as well as target2, where target2 occurs twice in a short span ...

Try the sub this way (not tested):

sub findtext { my ($files, $terms) = @_; my @filenames; for my $arg ( @$files ) { push @filenames, grep /\w/, split( /\W+/, $arg ); } my %target; $target{$_} = undef for @$terms; local $/ = undef; # this only applies within the sub for my $file ( @filenames ) { unless ( open( FILE, "/home/jroberts/$file.txt" )) { warn "open failed on $file: $!"; next; } $_ = <FILE>; # read full text; close FILE; my @words = split; # @words has all words in $file for ( @words ) { s{(.*)}{<B>$1</B>} if ( exists( $target{$_} )); } # all target words in $file are now marked, so # print the sequences that contain marked words my $printing = 0; for my $i ( 0 .. $#words ) { if ( $words[$i] =~ /<B>/ ) { if ( $i and $printing == 0 ) { # backtrack for prior + context my $j = ( $i >= 6 ) ? $i - 6 : 0; print join " ", @words[$j..$i-1]; } print $word[$i]; # (update: have to print this every t +ime) $printing = 6; # number of following words to print } elsif ( $printing ) { print $words[$i]; $printing--; print "\n<br/>\n" if ( $printing == 0 ); } } } }
(updated to always do the right thing when printing out the target strings)

Replies are listed 'Best First'.
Re: Re: Pattern Matching With Regular Expressions
by Anonymous Monk on Apr 28, 2004 at 03:00 UTC
    This doesn't work. It doesn't appear to substitute the bold in and I don't know why. I've tried tr with the same result. Anyone know what's wrong?
      Have you ever heard of running perl in "debug" mode? It's really cool, and if you haven't tried it, you're missing a lot. Check out "perldoc perldebug", then run your script with
      perl -d name_of_script [command line args for @ARGV]
      While the debugger is running your script, you can type "h" to get a brief summary of debugging commands; use "b findtext" to set a breakpoint in the sub, so whenever the execution enters that function, it stops to let you decide what to do next. At that point, you can step through a line at a time, set additional breakpoints, inspect current values of variables, and so on.

      Since you have not posted the surrounding code that is calling the findtext sub, you're on your own. (I did say the code was not tested...) Post another reply if/when you get some firm, clear evidence about a specific problem, after trying a couple diagnoses and solutions on your own, in case you still can't figure it out at that point.