in reply to Re: Re: Pattern Matching With Regular Expressions
in thread Pattern Matching With Regular Expressions

I don't reset the hash, I put $existing into it. Using your concatenated code doesn't work either, only one result. And I don't understand your KWIC code either, I don't think it will work for what I'm doing. How do I use seek in this context.
Thanks for the help so far,
jroberts
  • Comment on Re: Pattern Matching With Regular Expressions

Replies are listed 'Best First'.
Re: Re: Pattern Matching With Regular Expressions
by graff (Chancellor) on Apr 13, 2004 at 05:55 UTC
    Okay, I guess the KWIC thing, the way I originally tried to explain it, is a little off-track for you. Still, if you're goal is something like:
    • Highlight all words that match the set of target terms.
    • Print all occurrences of matching words along with some preceding and following words
    then you should consider storing all the words of the file (in order of occurrence) into a single array, adding highlights to the array elements that happen to match the search terms, and when that's done, go through the array to print out the regions that contain one or more highlighted terms, so the result looks like:
    ... this is a sequence that has target1 as well as target2, where target2 occurs twice in a short span ...

    Try the sub this way (not tested):

    sub findtext { my ($files, $terms) = @_; my @filenames; for my $arg ( @$files ) { push @filenames, grep /\w/, split( /\W+/, $arg ); } my %target; $target{$_} = undef for @$terms; local $/ = undef; # this only applies within the sub for my $file ( @filenames ) { unless ( open( FILE, "/home/jroberts/$file.txt" )) { warn "open failed on $file: $!"; next; } $_ = <FILE>; # read full text; close FILE; my @words = split; # @words has all words in $file for ( @words ) { s{(.*)}{<B>$1</B>} if ( exists( $target{$_} )); } # all target words in $file are now marked, so # print the sequences that contain marked words my $printing = 0; for my $i ( 0 .. $#words ) { if ( $words[$i] =~ /<B>/ ) { if ( $i and $printing == 0 ) { # backtrack for prior + context my $j = ( $i >= 6 ) ? $i - 6 : 0; print join " ", @words[$j..$i-1]; } print $word[$i]; # (update: have to print this every t +ime) $printing = 6; # number of following words to print } elsif ( $printing ) { print $words[$i]; $printing--; print "\n<br/>\n" if ( $printing == 0 ); } } } }
    (updated to always do the right thing when printing out the target strings)
      This doesn't work. It doesn't appear to substitute the bold in and I don't know why. I've tried tr with the same result. Anyone know what's wrong?
        Have you ever heard of running perl in "debug" mode? It's really cool, and if you haven't tried it, you're missing a lot. Check out "perldoc perldebug", then run your script with
        perl -d name_of_script [command line args for @ARGV]
        While the debugger is running your script, you can type "h" to get a brief summary of debugging commands; use "b findtext" to set a breakpoint in the sub, so whenever the execution enters that function, it stops to let you decide what to do next. At that point, you can step through a line at a time, set additional breakpoints, inspect current values of variables, and so on.

        Since you have not posted the surrounding code that is calling the findtext sub, you're on your own. (I did say the code was not tested...) Post another reply if/when you get some firm, clear evidence about a specific problem, after trying a couple diagnoses and solutions on your own, in case you still can't figure it out at that point.