in reply to Re^2: Regular Expression rematch
in thread Regular Expression rematch

That won't work if the term is longer than one character, so you might as well change
my @results = $str =~ m{ (?= (\S+ \s+ \Q$word\E \s+ \S+) ) }xmsg;
to
my @results = $str =~ m{ (?= (\S \s+ \Q$word\E \s+ \S+) ) }xmsg;
Or if you want longer terms:
my @results = $str =~ m{ (?<! \S ) (?= (\S+ \s+ \Q$word\E \s+ \S+) ) } +xmsg;

Replies are listed 'Best First'.
Re^4: Regular Expression rematch
by ig (Vicar) on Aug 16, 2009 at 05:43 UTC

    For reasons I don't know, the results can be quite different depending on the version of perl.

    use strict; use warnings; print "Perl version $]\n"; my $str = "1 plus 2 equals 3 but in this example, AX plus BY equals CZ +, DA plus EBCDEF plus FGH equals G and H plus I plus J plus K equals +L and I wonder what one gets from 7 plus Z"; my $word = "plus"; my @results = $str =~ m{ (?= (\S+ \s+ \Q$word\E \s+ \S+) ) }xmsg; print join("\n", @results), "\n";

    produces

    Perl version 5.008008 1 plus 2 AX plus BY X plus BY DA plus EBCDEF A plus EBCDEF EBCDEF plus FGH BCDEF plus FGH CDEF plus FGH DEF plus FGH EF plus FGH F plus FGH H plus I I plus J J plus K 7 plus Z

    or

    Perl version 5.010000 1 plus 2 AX plus BY DA plus EBCDEF EBCDEF plus FGH H plus I I plus J J plus K 7 plus Z
      I can replicate that with ActivePerl 5.10.0 build 1004. The 5.10.0 behaviour is buggy. Hopefully, it's been fixed for 5.10.1. RC1 of 5.10.1 was just released, so it's probably too late to fix it for 5.10.1 if it's hasn't already been fixed. Does someone has 5.10.1-RC1 handy?

        It seems it's not fixed in 5.10.1-RC1...

        Perl version 5.010001 1 plus 2 AX plus BY DA plus EBCDEF EBCDEF plus FGH H plus I I plus J J plus K 7 plus Z
Re^4: Regular Expression rematch
by AnomalousMonk (Archbishop) on Aug 16, 2009 at 06:05 UTC
    That won't work if the term is longer than one character...
    True, true.

    Actually, my preference is for an approach like that of Re: Building a boolean search engine. It's more wordy, but also provides more flexibility and control.
    (Update: But I guess one could ask: If you're going to do all that, why not just write a recursive descent parser?)

    So, something like:

    # use feature ':5.10'; use strict; use warnings; my $str = "In this example, AA plus B equals C, D minus EE times FFF equals G and HH plus I times JJ minus K equals L and M plusplus N plus plus O is invalid and P equals Q and RRR plus T"; use constant OPS => qw(plus minus times); my $op = qr{ \b (?: @{[ join '|', map quotemeta, OPS ]} ) \b }xms; my $not_op = qr{ (?! $op) }xms; my $operand = qr{ \b (?: $not_op \w)+ \b }xms; my $term = qr{ $operand \s+ $op \s+ $operand }xms; my @terms = $str =~ m{ (?= ($term)) }xmsg; print "$_ \n" for @terms;
    Output (same for ActiveState 5.8.2 and Strawberry 5.10.0.5):
    AA plus B D minus EE EE times FFF HH plus I I times JJ JJ minus K RRR plus T