Re: Regex: Matching around a word(s)

Here is a slightly different approach. At least as far as I can tell this is unique. This builds a hash of matches and then rescans the source printing the matches. This automaticaly condenses down all the overlaps.

#!/usr/bin/perl
use strict;
use warnings;


die "No search terms supplied!" unless @ARGV;
my @words = @ARGV;

my $regex =    join("|",@words)  ;
my $expr = qr /^($regex)$/;

$/ = ' ';
my $i = 0;
my $words = {};
my $pos = tell(DATA);
for my $word (<DATA>)
{
    chomp $word;
    $i++;
    if ($word =~ /$expr/) {
       for my $j (-5 .. 5) {
           $words->{$i + $j}++;
       }
    };
}
seek(DATA, $pos, 0);
$i =0;
for my $word (<DATA>)
{
    $i++;
    chomp $word;
    $word = "<$word>" if ($word =~ /$expr/);
    print "$word " if exists $words->{$i};

}



__DATA__
Regular expressions have always been a weak spot for me, and I've got 
+a
 question that's got me stumped. Here's the problem I'm trying to solv
+e.
 I have somewhat large articles of text (returned from a search), what
+ I'd
 like to do is capture the word and X number of words before and after
+ it
 while tagging the matching word in the captured text. My inital thoug
+ht
 was to try something like this. The problem I have is that if there i
+s
 more than one term and they overlap, the nth term will not be annotat
+ed.
 So my next thought is lookahead/lookbehind, but they don't capture.
 Is there a way to do this with a single regex? Is a regex even the be
+st
 way to do this? Thanks, -Lee
[download]

___________
Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;

Comment on Re: Regex: Matching around a word(s) Download Code

Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s) by shotgunefx (Parson) on Dec 20, 2005 at 19:19 UTC
Thanks. I'll whip up a benchmark with some of the different approaches, (actually have, but don't have yours in yet. Though I'll probably switch the filehandle use. Also, I'd probably use a hash instead of a hashref for $words as it is slightly faster. Not sure if it's faster or not (though "feels" it), I'd probably rewrite the following to use a hash slice `# before if ($word =~ /$expr/) { for my $j (-5 .. 5) { $words->{$i + $j}++; } }; # after my $mwords = 5; @words{$i-$mwords..$i+$mwords} = 1 if $word =~ /$expr/; # Note: Keys created, but all but first have undef values` [download] -Lee perl digital dash (in progress)	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Regex: Matching around a word(s)
by shotgunefx (Parson) on Dec 20, 2005 at 19:19 UTC

slightly

# before
    if ($word =~ /$expr/) {
       for my $j (-5 .. 5) {
           $words->{$i + $j}++;
       }
    };
# after
    my $mwords = 5;
    @words{$i-$mwords..$i+$mwords} = 1
       if $word =~ /$expr/;
    # Note: Keys created, but all but first have undef values
[download]

-Lee

perl digital dash (in progress)

[reply]
[d/l]