#! perl -slw
use strict;
my $word = $ARGV[0] or die "No search term";
( my $text = do{ local $/; <DATA> } ) =~ tr[\n][]d;
$text =~ s[
( (?: \S+ \s+ ){1,3} )
( $word ) [[:punct:]]*
(?= (
(?:
(?: \s+ \S+ ){0,6}
\s+ ( $word ) [[:punct:]]*
(?: \s+ \S+ ){1,3}
)
|
(?: \s+ \S+ ){1,3}
) )
][
my $extract = "$1<$2>$3";
$extract =~ s[\s($word)][ <$1>]g;
print $extract;
]gex;
__END__
Regular expressions have always been a weak spot for me, and I've got
+a
question that's got me stumped. Here's the problem I'm trying to solv
+e.
I have somwhat large articles of text (returned from a search), what
+I'd
like to do is capture the word and X number of words before and after
+ it
while tagging the matching word in the captured text. My inital thoug
+ht
was to try something like this. The problem I have is that if there i
+s
more than one term and they overlap, the nth term will not be annotat
+ed.
So my next thought is lookahead/lookbehind, but they don't capture.
Is there a way to do this with a single regex? Is a regex even the be
+st
way to do this? Thanks, -Lee
A test P:\test>517393 and
spot for me, <and> I've got a
capture the word <and> X number of words before <and> after it while
of words before <and> after it while
than one term <and> they overlap, the
As is, it won't try for a third or fourth, (and is currently repeating itself!), but it should be possible to do that using a independant subregex ((?{ $regex ))), I just haven't got it right yet. I'll have another go tomorrow when my eyes are open :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|