plank has asked for the wisdom of the Perl Monks concerning the following question:

I don't think perl's regular expression engine is reentrant and I need this in order to do what I want.

My general problem is:

I want to do substitution of text nodes in xml content without paying any attention to the fact the xml is well formed or not.

I have a regexp that extracts all text nodes correctly so now all I need is to use this regexp and do the transformation on the captured text node. Now the tricky part is the transformation is very easy (and efficient) to do using regexps and thus this would imply running the re engine while an instance is already running by using the 'e' switch.

From past inquiries I don't think the regexp engine is reentrant and thus this should result in undefined behaviour at best. Weird thing is that so far it has worked and I have no idea why and whether it will always work or result in incorrect behaviour or segfault.

A simpler example of what's at stake is the following code (not the full fledged code I intend to use but reentrancy should also be a problem here and simpler to pin down. (I now this situation doesn't require regexps, this is for testing purposes only)

#!/usr/bin/perl -l sub censor{ my ($s) = @_; $s =~ s/.(.|$)/*$1/g; return $s; } while(my $line=<>){ chomp($line); $line =~ s/(.*?)(:|$)/censor($1)."$2"/eg; print $line; }
Example of a session:
123:1234:12345:123456:1234567 *2*:*2*4:*2*4*:*2*4*6:*2*4*6*

So, in summary, should this work (I don't think it should) and if not, why the hell has this worked every single time?

Thanks in advance.

PS: If there is a real need I can post code that replicates the exact behaviour I need, but this code should capture the essence of the problem.

Replies are listed 'Best First'.
Re: is perl's regular expression engine reentrant ? (works)
by tye (Sage) on Apr 27, 2007 at 00:48 UTC

    s/regex/replacement/e is re-entrant as far as the right-hand (replacement) expression is concerned. The regex engine itself has gotten better but isn't totally re-entrant yet but this only matters if you use experimental features to jump out to code from within the regex itself (left-hand side).

    - tye        

      So I can do whatever I want on the "right side" with no undefined behaviour.

      Great news!

      I suppose you are saying that reentrancy is only an issue with ?{code} and ??{code} constructs.

      I knew these were problematic (and experimental features) but thought that making substitutions using the re engine on the evaluated code could also be a problem.

      Any idea where I can find documentation that mentions this explicitly

      Thank you.

        I suppose you are saying that reentrancy is only an issue with ?{code} and ??{code} constructs.

        This is in fact an issue. The following one-liner will segfault (with perl 5.8.8):

        perl -we '"_" =~ m{(??{"_" =~ /./})}'