Last weekend, diotalevi started an interesting discussion about regexp reentrancy, and everyone basically agreed that you cannot reenter regexp.

Now my curiosity drives me to think, what if I re-enter regexp on a new thread? Is regexp engine per thread? So I designed this set of three experiments. It was first designed with only two experiments:
  1. regexp reentrant on same thread;
  2. regexp reentrant on different threads.
  3. However, after I finished the first two parts, I realized the need to add a third one, which is a modified version of the second one, but without reentrant.
Now the details:
  1. First experiment
    Expected result: fail.
    Actual result: fail.
    Observation: No surprise, it cored. However before it cored, I already have seen all the printing, and they are correct (is this a surprise?)
    (As you can see there is an assignment, my $tmp = $2. It did not make any difference here, but it did make a difference in experiment two and three. I added it here just to make it fair for all three experiments.)
    use strict; use warnings; $| ++; sub parser { my $str = shift; $str =~ m/^\[(\d*),(.*?)\]$(?{print "1 = $1\n2 = $2\n";my $tmp = $ +2;parser($tmp) if ($tmp ne "[]")})/; } my $str = "[0,[1,[2,[3,[4,[5,[6,[7,[8,[9,[]]]]]]]]]]]"; parser $str;
  2. Second experiment
    Expected result: to be frank, I didn't dare to guess.
    Actual result: fail.
    Observation: Surprise? yes and no. Just as experienment one, it cored, again before core, I have seen all printing already.
    If I remove that my $tmp = $2, lots of other problem are reported. Last time, we said that the reason regexp engine cannot reentered, is that regexp uses a set of global variables, I believe $1, $2 are among those.
    use threads; use strict; use warnings; $| ++; #open(STDERR, ">", "error.txt"); sub parser { my $str = shift; $str =~ m/^\[(\d*),(.*?)\]$(??{print "1 = $1\n2 = $2\n";my $tmp = +$2;if ($tmp ne "[]") {my $t = threads->create("parser",$tmp); $t->joi +n()}})/;} my $str = "[0,[1,[2,[3,[4,[5,[6,[7,[8,[9,[]]]]]]]]]]]"; parser $str;
  3. Third experiment
    Expected result: pass.
    Actual result: pass.
    Observation: But it came to me as a surprise that I have to do that $tmp = $2, otherwise it cores.

Replies are listed 'Best First'.
Re: revisit regexp reentrancy
by pg (Canon) on Dec 14, 2002 at 04:20 UTC
    Sorry, did it too quick, and didn't copy and paste the code for part 3, but I cannot update the original post... here comes it:
    use threads; use strict; use warnings; $| ++; open(STDERR, ">", "error.txt"); sub parser { my $str = shift; $str =~ m/^\[(\d*),(.*?)\]$/; print "1 = $1\n2 = $2\n"; my $tmp = $2; if ($2 ne "[]") { my $t = threads->create("parser",$2); $t->join() } } my $str = "[0,[1,[2,[3,[4,[5,[6,[7,[8,[9,[]]]]]]]]]]]"; parser $str;