Re: Regex: Matching around a word(s)

Update: I missed the bit about "overlapping matches....

Something like this?

 $s = 'I have somwhat large articles of text (returned from a search),
+ what I\'d like to do is capture the word and X number of words befor
+e and after it while tagging the matching word in the captured text.'
+;;

print $s =~ m[ ( (?: \S+ \s+ ){3} X (?: \s+ \S+ ){3} ) ]x;;

the word and X number of words
[download]

Update2: Here's one that does overlapping and tagging:

#! perl -slw
use strict;

my $word = $ARGV[0] or die "No search term";

( my $text = do{ local $/; <DATA> } ) =~ tr[\n][]d;

$text =~ s[
    ( (?: \S+ \s+ ){1,3} )
    ( $word ) [[:punct:]]*
    (?= ( (?: \s+ \S+ ){1,3} ) )
][
    print "$1<$2>$3"
]gex;

__END__
Regular expressions have always been a weak spot for me, and I've got 
+a
question that's got me stumped. Here's the problem I'm trying to solve
+.
I have somwhat large articles of text (returned from a search), what I
+'d
like to do is capture the word and X number of words before and after 
+it
while tagging the matching word in the captured text. My inital though
+t
was to try something like this. The problem I have is that if there is
more than one term and they overlap, the nth term will not be annotate
+d.
So my next thought is lookahead/lookbehind, but they don't capture.
Is there a way to do this with a single regex? Is a regex even the bes
+t
way to do this? Thanks, -Lee
[download]

Some results

P:\test>junk me
weak spot for <me> and I've got
aquestion that's got <me> stumped. Here's the

P:\test>junk is
I'dlike to do <is> capture the word
problem I have <is> that if there
my next thought <is> lookahead/lookbehind, but they

P:\test>junk got
me, and I've <got> aquestion that's got
aquestion that's <got> me stumped. Here's

P:\test>junk to
problem I'm trying <to> solve.I have somwhat
search), what I'dlike <to> do is capture
My inital thoughtwas <to> try something like
there a way <to> do this with
even the bestway <to> do this? Thanks,
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Regex: Matching around a word(s) Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s) by shotgunefx (Parson) on Dec 17, 2005 at 01:15 UTC
Appreciate the help, though a couple issues. 1. I need to find multiple matches. 2. This fails on some input. Take "and" for example. `#! perl -slw use strict; my $word = $ARGV[0] or die "No search term"; ( my $text = do{ local $/; <DATA> } ) =~ tr[\n][]d; $text =~ s[ ( (?: \S+ \s+ ){1,3} ) ( $word ) [[:punct:]]* (?= ( (?: \s+ \S+ ){1,3} ) ) ][ print "$1<$2>$3" ]gex; __END__ this finds and matches and highlights matches.` [download] This outputs this finds <and> matches and highlights matches <and> highlights matches. -Lee perl digital dash (in progress)	[reply] [d/l]
Re^3: Regex: Matching around a word(s) by BrowserUk (Patriarch) on Dec 17, 2005 at 01:37 UTC
Strange. I copied the above code and ran it and got this output: `P:\test>junk and this finds <and> matches and highlights matches <and> highlights matches.` [download] Which is correct as far as I can tell? It found and highlighted both "and"s; That is what you mean by multiple matches isn't it? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Regex: Matching around a word(s) by shotgunefx (Parson) on Dec 17, 2005 at 01:53 UTC
Not quite. My fault for not being clear. Don't know where my head is at. That fixes the initial problem of it finding both matches, which brings the problem of results looking odd when two terms are very close togther as you'll have overlapping fragments. I misread the output as it capturing incorrectly with that oddly reptetitive input. If I had changed `this print "$1<$2>$3" to this print "...$1<$2>$3..."` [download] I would have seen my error. I think I'm going to have to find the spans of interesting text and condense them. -Lee perl digital dash (in progress)	[reply] [d/l]
Re^5: Regex: Matching around a word(s) by BrowserUk (Patriarch) on Dec 17, 2005 at 02:32 UTC
Re^6: Regex: Matching around a word(s) by shotgunefx (Parson) on Dec 17, 2005 at 03:10 UTC
Some notes below your chosen depth have not been shown here
Re^5: Regex: Matching around a word(s) by BrowserUk (Patriarch) on Dec 17, 2005 at 02:00 UTC