in reply to Regex: Matching around a word(s)

Update: I missed the bit about "overlapping matches....

Something like this?

$s = 'I have somwhat large articles of text (returned from a search), + what I\'d like to do is capture the word and X number of words befor +e and after it while tagging the matching word in the captured text.' +;; print $s =~ m[ ( (?: \S+ \s+ ){3} X (?: \s+ \S+ ){3} ) ]x;; the word and X number of words

Update2: Here's one that does overlapping and tagging:

#! perl -slw use strict; my $word = $ARGV[0] or die "No search term"; ( my $text = do{ local $/; <DATA> } ) =~ tr[\n][]d; $text =~ s[ ( (?: \S+ \s+ ){1,3} ) ( $word ) [[:punct:]]* (?= ( (?: \s+ \S+ ){1,3} ) ) ][ print "$1<$2>$3" ]gex; __END__ Regular expressions have always been a weak spot for me, and I've got +a question that's got me stumped. Here's the problem I'm trying to solve +. I have somwhat large articles of text (returned from a search), what I +'d like to do is capture the word and X number of words before and after +it while tagging the matching word in the captured text. My inital though +t was to try something like this. The problem I have is that if there is more than one term and they overlap, the nth term will not be annotate +d. So my next thought is lookahead/lookbehind, but they don't capture. Is there a way to do this with a single regex? Is a regex even the bes +t way to do this? Thanks, -Lee

Some results

P:\test>junk me weak spot for <me> and I've got aquestion that's got <me> stumped. Here's the P:\test>junk is I'dlike to do <is> capture the word problem I have <is> that if there my next thought <is> lookahead/lookbehind, but they P:\test>junk got me, and I've <got> aquestion that's got aquestion that's <got> me stumped. Here's P:\test>junk to problem I'm trying <to> solve.I have somwhat search), what I'dlike <to> do is capture My inital thoughtwas <to> try something like there a way <to> do this with even the bestway <to> do this? Thanks,

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s)
by shotgunefx (Parson) on Dec 17, 2005 at 01:15 UTC
    Appreciate the help, though a couple issues. 1. I need to find multiple matches.
    2. This fails on some input.

    Take "and" for example.
    #! perl -slw use strict; my $word = $ARGV[0] or die "No search term"; ( my $text = do{ local $/; <DATA> } ) =~ tr[\n][]d; $text =~ s[ ( (?: \S+ \s+ ){1,3} ) ( $word ) [[:punct:]]* (?= ( (?: \s+ \S+ ){1,3} ) ) ][ print "$1<$2>$3" ]gex; __END__ this finds and matches and highlights matches.
    This outputs
    this finds <and> matches and highlights matches <and> highlights matches.

    -Lee

    perl digital dash (in progress)

      Strange. I copied the above code and ran it and got this output:

      P:\test>junk and this finds <and> matches and highlights matches <and> highlights matches.

      Which is correct as far as I can tell?

      It found and highlighted both "and"s; That is what you mean by multiple matches isn't it?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Not quite. My fault for not being clear. Don't know where my head is at.

        That fixes the initial problem of it finding both matches, which brings the problem of results looking odd when two terms are very close togther as you'll have overlapping fragments.

        I misread the output as it capturing incorrectly with that oddly reptetitive input. If I had changed
        this print "$1<$2>$3" to this print "...$1<$2>$3..."
        I would have seen my error. I think I'm going to have to find the spans of interesting text and condense them.


        -Lee

        perl digital dash (in progress)