Putter wondered on #perl6 if there was a way to write something that can be matched just like a regex and will set all the $1, $2, $&, @-, $+, $^N variables correctly. I was told the obvious things didn't work so I didn't try them :) and managed to get something close to a solution ($^N is wrong). Just to be clear the problem is to supply a $qr which sets all perl's regex vars yet could be using a different regex engine.
$str =~ $qr; print "$1, $2\n"; # or whatever
This code is proof of concept and has only been tested on the single instance shown.
#!/usr/bin/perl -w use strict; use re 'eval'; sub showvars ; my $s = "abcdefghi"; my (@a, $pos); print qq{"$s" =~ /(b)(.(.))/;\n}; #match($s, qr{(b)(.(.))}x ) and exit; # This doesn't set $^N correctly and we need to know the nesting # The rest of the regex vars should be ok if you know the number of pa +rens # If not then add braces up to $99, but $+, $#+ and the extra $n's wil +l be wrong match($s, qr/ (?{ $pos = 1; # pos if $& start # @a = your_fn($_) @a = ([0,3],[0,1],[1,2],[2,1]); # offset & length of captures # $a[0] is $&, $a[1] is $1, etc. # $a[1][0] is $-[0] and $a[1][1] is $+[0] - $-[0] }) # Capture $1: # Wrap this in (?= ) to not bump pos (?= (??{ qr!.{$a[1][0]}! }) # Advance to start of $1 ((??{ qr!.{$a[1][1]}! })) # Capture the right length ) # $2, $3, etc. (?= (??{ qr!.{$a[2][0]}! }) ((??{ qr!.{$a[2][1]}! })) ) (?= (??{ qr!.{$a[3][0]}! }) ((??{ qr!.{$a[3][1]}! })) ) # I think the parens are counted at regex compile time # so they need to be known in advance (or $+, $#+, $4 will be wron +g) # bump pos until where at the right spot (??{ (pos == $pos) ? qr{} : qr{(?!)}; }) # capture $& (??{ qr!.{$a[0][1]}! }) /xs ); sub match { my ($s, $qr) = @_; $s =~ $qr or die "No match $s =~ $qr"; showvars qw($` $& $'); showvars qw($+ $^N); showvars qw($1 $2 $3 $4 $5 $6); showvars qw(@-); showvars qw(@+); } sub showvars { no warnings 'uninitialized'; print "$_ = (",join(",",eval $_),") " for @_; print "\n"; } # $+ text of last sucessful match # $^N similar, but of last rightmost closing paren # @+ array or end pos, $+[0] is whole, $#+ is last good
The output is:
"abcdefghi" =~ /(b)(.(.))/; $` = (a) $& = (bcd) $' = (efghi) $+ = (d) $^N = (d) $1 = (b) $2 = (cd) $3 = (d) $4 = () $5 = () $6 = () @- = (1,1,2,3) @+ = (4,2,4,4)
Brad

Replies are listed 'Best First'.
Re: Plug for an alternate regex engine
by hv (Prior) on Feb 21, 2006 at 10:46 UTC

    It is certainly possible to write a function that accepts information about the values you want to set for the variables, that returns a regexp to set them up in that way.

    As you supposed, the parens are counted at regexp compile time, so it is not possible to embed all the logic in a regexp without fixing the paren count in advance.

    $^N will be the first match capture that reaches the maximum value of @+[1..], which can be emulated by constructing the regexp to nest the captures that end at this point.

    You may need to allow for some captures being unset, as in "ac" =~ /(a)?(b)?(c)?/.

    For the simple case where the nesting is natural, most efficient would be to forget the lookaheads and just construct a nesting of dots and parens, along with a simple negative lookahead for unset parens ((?!))?. I think something like the below would do it, but I have not tested it exhaustively:

    my $s = "abcdefghi"; my @a = ([ 0, 3 ], [ 0, 1 ], [ 1, 2 ], [ 2, 1 ]); my $qr = matcher(\@a); match($s, $qr); sub matcher { my $array = shift; my(%pos, @undef); my($start, $length) = @{ $array->[0] }; for (1 .. $#$array) { my($pos, $width) = ($array->[$_][0], $array->[$_][1]); if (defined $pos) { $pos{$pos} .= '('; $pos{$pos + $width} .= ')'; } else { push @undef, $_; } } my $qr = '.' x ($start + $length); for (sort { $b <=> $a } keys %pos) { substr $qr, $_, 0, $pos{$_}; } for (reverse @undef) { $qr =~ s/((.*?\(){$_})/$1((?!))?/; } $qr =~ s{(.{$start})}{(?<=^$1)} if $start; qr/$qr/; }

    Extending this to add lookaheads for captures that are not naturally nested is left as an exercise for the reader. :)

    With respect to your title, note that it is possible to plug in an alternate regexp engine - this is how use re 'debug' is implemented - but I'm not aware that anyone has ever taken advantage of this, nor do I imagine there is any documentation of how you might do so.

    Hugo

Re: Plug for an alternate regex engine
by bsb (Priest) on Feb 21, 2006 at 18:51 UTC
    As suggested by hv, I got the $^N working. Again you need to know the nesting in advance. This is the $2 & $3 bit:
    (?= (??{ qr!.{$a[2][0]}! }) ( # start $2 (?= (??{ qr!.{${\($a[3][0]-$a[2][0])}}! }) + ((??{ qr!.{$a[3][1]}! })) ) # $3 nested (??{ qr!.{$a[2][1]}! }) ) # $2 ended )