in reply to Re: Regular Expression rematch
in thread Regular Expression rematch

Or even:
my @results = $str =~ m{ (?= (\S+ \s+ \Q$word\E \s+ \S+) ) }xmsg;

Replies are listed 'Best First'.
Re^3: Regular Expression rematch
by ikegami (Patriarch) on Aug 16, 2009 at 02:50 UTC
    That won't work if the term is longer than one character, so you might as well change
    my @results = $str =~ m{ (?= (\S+ \s+ \Q$word\E \s+ \S+) ) }xmsg;
    to
    my @results = $str =~ m{ (?= (\S \s+ \Q$word\E \s+ \S+) ) }xmsg;
    Or if you want longer terms:
    my @results = $str =~ m{ (?<! \S ) (?= (\S+ \s+ \Q$word\E \s+ \S+) ) } +xmsg;

      For reasons I don't know, the results can be quite different depending on the version of perl.

      use strict; use warnings; print "Perl version $]\n"; my $str = "1 plus 2 equals 3 but in this example, AX plus BY equals CZ +, DA plus EBCDEF plus FGH equals G and H plus I plus J plus K equals +L and I wonder what one gets from 7 plus Z"; my $word = "plus"; my @results = $str =~ m{ (?= (\S+ \s+ \Q$word\E \s+ \S+) ) }xmsg; print join("\n", @results), "\n";

      produces

      Perl version 5.008008 1 plus 2 AX plus BY X plus BY DA plus EBCDEF A plus EBCDEF EBCDEF plus FGH BCDEF plus FGH CDEF plus FGH DEF plus FGH EF plus FGH F plus FGH H plus I I plus J J plus K 7 plus Z

      or

      Perl version 5.010000 1 plus 2 AX plus BY DA plus EBCDEF EBCDEF plus FGH H plus I I plus J J plus K 7 plus Z
        I can replicate that with ActivePerl 5.10.0 build 1004. The 5.10.0 behaviour is buggy. Hopefully, it's been fixed for 5.10.1. RC1 of 5.10.1 was just released, so it's probably too late to fix it for 5.10.1 if it's hasn't already been fixed. Does someone has 5.10.1-RC1 handy?
      That won't work if the term is longer than one character...
      True, true.

      Actually, my preference is for an approach like that of Re: Building a boolean search engine. It's more wordy, but also provides more flexibility and control.
      (Update: But I guess one could ask: If you're going to do all that, why not just write a recursive descent parser?)

      So, something like:

      # use feature ':5.10'; use strict; use warnings; my $str = "In this example, AA plus B equals C, D minus EE times FFF equals G and HH plus I times JJ minus K equals L and M plusplus N plus plus O is invalid and P equals Q and RRR plus T"; use constant OPS => qw(plus minus times); my $op = qr{ \b (?: @{[ join '|', map quotemeta, OPS ]} ) \b }xms; my $not_op = qr{ (?! $op) }xms; my $operand = qr{ \b (?: $not_op \w)+ \b }xms; my $term = qr{ $operand \s+ $op \s+ $operand }xms; my @terms = $str =~ m{ (?= ($term)) }xmsg; print "$_ \n" for @terms;
      Output (same for ActiveState 5.8.2 and Strawberry 5.10.0.5):
      AA plus B D minus EE EE times FFF HH plus I I times JJ JJ minus K RRR plus T