casiano has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

After playing a bit with perl 5.10 regexps I have being able to work a primitive infix to postfix translator with them:

pl@nereida:~/Lperltesting$ cat ./calc510withactions2.pl #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 use v5.10; # Infix to postfix translator using 5.10 regexp # Original grammar: # exp -> exp [-+] term # | term # term -> term [*/] digits # | digits # Applying left-recursion elimination we have: # exp -> term re # re -> [+-] term re # | # empty # term -> digits rt # rt -> [*/] rt # | # empty my @stack; my $regexp = qr{ (?&exp) (?(DEFINE) (?<exp> (?&term) (?&re) ) (?<re> \s* ([+-]) (?&term) \s* (?{ push @stack, $^N }) (?& +re) | # empty ) (?<term> (?&digits) (?&rt) ) (?<rt> \s*([*/]) (?&digits) \s* (?{ push @stack, $^N }) (? +&rt) | # empty ) (?<digits> \s* (\d+) (?{ push @stack, $^N }) ) ) }xms; my $input = <>; chomp($input); if ($input =~ $regexp) { say "matches: $&\nStack=(@stack)"; } else { say "does not match"; }
Minus and divide operators seem to behave according the expected left associativity:
pl@nereida:~/Lperltesting$ ./calc510withactions2.pl 2-8*4/2/4-1 matches: 2-8*4/2/4-1 Stack=(2 8 4 * 2 / 4 / - 1 -)
I feel however that I can't extend this methodology to more complex tasks because I haven't found a way other than $^N to refer to the values associated with the previous parenthesis. Which means that I can only refer to the attribute of the last parenthesis.

Is in Perl 5.10 something like relative variable backreferences so that I can use something like $-1, $-2, etc. to refer to the last parenthesis, the one before the last, etc. inside the embeded code? I.e. I am looking for s.t. similar to relative backreferences like \g{-1} but to be used inside inserted code?

Apologies for not being able to clarify the question enough. I feel I do not have yet a clear understanding of Perl 5.10 regexps.

Replies are listed 'Best First'.
Re: Backreference variables in code embedded inside Perl 5.10 regexps
by ikegami (Patriarch) on Sep 09, 2009 at 18:37 UTC
    I'm not familiar with 5.10's re improvements, so there could be a better way.
    local our @stack; 'abc' =~ / (?{ [] }) (.) (?{ [ @{ $^R }, $^N ] }) (.) (?{ [ @{ $^R }, $^N ] }) (.) (?{ [ @{ $^R }, $^N ] }) (?{ push @stack, $^R }) /x; print "[$_]" for @{ $stack[0] }; # [a][b][c] print "\n";
      Thanks Ikegami!

      I look forward for something like that integrated in Perl 5.10 regexps. I am slowly processing your solutions. The general idea is to save the attributes of the previous parenthesis and for that I can use some local auxiliary variables. The following is a rewrite of the grammar above that is a more general solution. It does not pushes numbers unconditionally. It uses variable $op and an intermediate action to save the required attribute:

      pl@nereida:~/Lperltesting$ cat calc510withactions3.pl #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 use v5.10; # Infix to postfix translator using 5.10 regexp # Original grammar: # exp -> exp [-+] term # | term # term -> term [*/] digits # | digits # Applying left-recursion elimination we have: # exp -> term re # re -> [+-] term re # | # empty # term -> digits rt # rt -> [*/] rt # | # empty my $input; sub echo { my $p = substr($input, 0, pos($input)); say $p .(" " x (length($input)-length($p))) ."\t".$_[0]; } my @stack; local our $op = ''; my $regexp = qr{ (?&exp) (?(DEFINE) (?<exp> (?&term) (?&re) (?{ echo "exp -> term re" }) ) (?<re> \s* ([+-]) (?&term) \s* (?{ push @stack, $^N }) (?& +re) (?{ echo "re -> [+-] term re" }) | # empty (?{ echo "re -> empty" }) ) (?<term> ((?&digits)) (?{ # intermediate action push @stack, $^N }) (?&rt) (?{ echo "term-> digits($^N) rt"; }) ) (?<rt> \s*([*/]) (?{ # intermediate action local $op = $^N; }) ((?&digits)) \s* (?{ # intermediate action push @stack, $^N, $op }) (?&rt) # end of <rt> definition (?{ echo "rt -> [*/] digits($^N) rt" }) | # empty (?{ echo "rt -> empty" }) ) (?<digits> \s* \d+ ) ) }xms; $input = <>; chomp($input); if ($input =~ $regexp) { say "matches: $&\nStack=(@stack)"; } else { say "does not match"; }
      The execution shows how a rightmost anti-derivation is built by the perl 5.10 engine:
      pl@nereida:~/Lperltesting$ ./calc510withactions3.pl 2-8*4/2/4-1 2 rt -> empty 2 term-> digits(2) rt 2-8*4/2/4 rt -> empty 2-8*4/2/4 rt -> [*/] digits(4) rt 2-8*4/2/4 rt -> [*/] digits(2) rt 2-8*4/2/4 rt -> [*/] digits(4) rt 2-8*4/2/4 term-> digits(8) rt 2-8*4/2/4-1 rt -> empty 2-8*4/2/4-1 term-> digits(1) rt 2-8*4/2/4-1 re -> empty 2-8*4/2/4-1 re -> [+-] term re 2-8*4/2/4-1 re -> [+-] term re 2-8*4/2/4-1 exp -> term re matches: 2-8*4/2/4-1 Stack=(2 8 4 * 2 / 4 / - 1 -)
        No good. Fails if any called production uses (?{}).
        use strict; use warnings; sub parser { local our @stack; local our @rv; my $parser = qr{ ^ (?&expr) (?&expr) \z (?{ @rv = @stack; }) (?(DEFINE) (?<expr> (?{ [] }) (.) (?{ [ @{ $^R }, $^N ] }) #(?&foo) #(?&bar) (.) (?{ [ @{ $^R }, $^N ] }) (?{ local @stack = ( @stack, join '|', @{ $^R } ); }) ) (?<foo> ) (?<bar> (?{ [] }) ) ) }x; return $_[0] =~ /$parser/ && \@rv; } my $rv = parser('abcd'); print("$_\n") for @$rv;

        Works with (?&foo) uncommented but not with (?&bar) uncommented.

Re: Backreference variables in code embedded inside Perl 5.10 regexps (try3)
by ikegami (Patriarch) on Sep 09, 2009 at 20:08 UTC
    I found a solution that adds 4 characters per capture:
    use strict; use warnings; sub parser { local our @stack; local our @rv; my $parser = qr{ ^ (?&expr) (?&expr) \z (?{ @rv = @stack; }) (?(DEFINE) (?<expr> (?<i> . ) (?<i> . ) (?{ local @stack = ( @stack, $-{i}[0] . '|' . $-{i}[1] + ); }) ) ) }x; return $_[0] =~ /$parser/ && \@rv; } my $rv = parser('abcd'); print("$_\n") for @$rv;
    or
    use strict; use warnings; sub parser { local our @stack; local our @rv; my $parser = qr{ ^ (?&expr) (?&expr) \z (?{ @rv = @stack; }) (?(DEFINE) (?<expr> (?<i> . ) (?<j> . ) (?{ local @stack = ( @stack, $+{i} . '|' . $+{j} ); }) ) ) }x; return $_[0] =~ /$parser/ && \@rv; } my $rv = parser('abcd'); print("$_\n") for @$rv;
Re: Backreference variables in code embedded inside Perl 5.10 regexps (try2)
by ikegami (Patriarch) on Sep 09, 2009 at 19:50 UTC
    Got a better solution for you:
    use strict; use warnings; sub rel_cap { my ($ofs) = @_; substr($_, $-[$ofs], $+[$ofs] - $-[$ofs] +) } sub parser { local our @stack; local our @rv; my $parser = qr{ ^ (?&expr) (?&expr) \z (?{ @rv = @stack; }) (?(DEFINE) (?<expr> (.) (.) (?{ local @stack = (@stack, rel_cap(-2) . "|" . rel_cap(-1) ); }) ) ) }x; return $_[0] =~ /$parser/ && \@rv; } my $rv = parser('abcd'); print("$_\n") for @$rv;
    a|b c|d

    Update: Nevermind. @- and/or @+ become all wonky if there's a (?&...) inside the (?<...>...).

    Even something as simple as the following fails:

    (?<expr> (?&foo) (.) (?{ local @stack = (@stack, rel_cap(-1) ); }) ) (?<foo> . )
      Many, many thanks ikegami.
      ... Nevermind. @- and/or @+ become all wonky if there's a (?&...) inside the (?<...>...).
      It seems to me that I have found the reason it becomes wonky: see the node

      Strange behavior of @- and @+ in perl5.10 regexps.

      The following version of your rel_cap subroutine seems to work:

      pl@nereida:~/Lperltesting$ cat calc510withactions4.pl #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 use v5.10; # Infix to postfix translator using 5.10 regexp # Original grammar: # exp -> exp [-+] term # | term # term -> term [*/] digits # | digits # Applying left-recursion elimination we have: # exp -> term re # re -> [+-] term re # | # empty # term -> digits rt # rt -> [*/] rt # | # empty sub rc { my $ofs = - shift; my $np = @-; substr($_, $-[$ofs], $+[$np+$ofs] - $-[$ofs]) } my $input; my @stack; my $regexp = qr{ (?&exp) (?(DEFINE) (?<exp> (?&term) (?&re) (?{ say "exp -> term re" }) ) (?<re> \s* ([+-]) (?&term) \s* (?{ push @stack, $^N }) (?& +re) (?{ say "re -> [+-] term re" }) | # empty (?{ say "re -> empty" }) ) (?<term> ((?&digits)) (?{ # intermediate action push @stack, $^N }) (?&rt) (?{ say "term-> digits($^N) rt"; }) ) (?<rt> \s*([*/]) ((?&digits)) \s* (?{ # intermediate action push @stack, rc(1), rc(2) }) (?&rt) # end of <rt> definition (?{ say "rt -> [*/] digits($^N) rt" }) | # empty (?{ say "rt -> empty" }) ) (?<digits> \s* \d+ ) ) }xms; $input = <>; chomp($input); if ($input =~ $regexp) { say "matches: $&\nStack=(@stack)"; } else { say "does not match"; }
      Now I can access the attributes of the previous symbols. See the line
      push @stack, rc(1), rc(2)
      Follows an execution:
      pl@nereida:~/Lperltesting$ ./calc510withactions4.pl 2-8/4/2-1 rt -> empty term-> digits(2) rt rt -> empty rt -> [*/] digits(2) rt rt -> [*/] digits(4) rt term-> digits(8) rt rt -> empty term-> digits(1) rt re -> empty re -> [+-] term re re -> [+-] term re exp -> term re matches: 2-8/4/2-1 Stack=(2 8 4 / 2 / - 1 -)