... this is a sequence that has target1 as well as target2, where target2 occurs twice in a short span ...
Try the sub this way (not tested):
(updated to always do the right thing when printing out the target strings)sub findtext { my ($files, $terms) = @_; my @filenames; for my $arg ( @$files ) { push @filenames, grep /\w/, split( /\W+/, $arg ); } my %target; $target{$_} = undef for @$terms; local $/ = undef; # this only applies within the sub for my $file ( @filenames ) { unless ( open( FILE, "/home/jroberts/$file.txt" )) { warn "open failed on $file: $!"; next; } $_ = <FILE>; # read full text; close FILE; my @words = split; # @words has all words in $file for ( @words ) { s{(.*)}{<B>$1</B>} if ( exists( $target{$_} )); } # all target words in $file are now marked, so # print the sequences that contain marked words my $printing = 0; for my $i ( 0 .. $#words ) { if ( $words[$i] =~ /<B>/ ) { if ( $i and $printing == 0 ) { # backtrack for prior + context my $j = ( $i >= 6 ) ? $i - 6 : 0; print join " ", @words[$j..$i-1]; } print $word[$i]; # (update: have to print this every t +ime) $printing = 6; # number of following words to print } elsif ( $printing ) { print $words[$i]; $printing--; print "\n<br/>\n" if ( $printing == 0 ); } } } }
In reply to Re: Re: Pattern Matching With Regular Expressions
by graff
in thread Pattern Matching With Regular Expressions
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |