in reply to Re: What's broken in Perl 5?
in thread What's broken in Perl 5?

AFAIK at least some of the regex expressions you mean will always be marked "experimental" in some way or another. The reason is that the interactions of the "normal" regex engine, the optimiser and run-time optimisation vary with different versions of perl and as these regops allow one to "see through" to the behaviour of the engine itself their behaviour becomes not well defined. The experimental marking is there so that people know the results of using them will vary with perl to perl.

A good example would be the following two snippets which should be functionally equivelent but don't do the same thing because the code block violates the encapsulation of the regex engines behaviour.

E:\>perl -le "print 'foo baz'=~/(foo|foo|foo)\s(?{print qq(Got '$1')}) +ba[r]/ ? qq(Matches '$1') : 'No match!'" Got 'foo' Got 'foo' Got 'foo' No match! E:\>perl -le "print 'foo baz'=~/(foo|foo|foo)\s(?{print qq(Got '$1')}) +bar/ ? qq(Matches '$1') : 'No match!'" No match!

The reason this happens is that the tail part of the regex (the 'bar' part) allows the regex engine to use Boyer-Moore matching as an optimisation to find each 'bar' in the string being searched and then start the match process from relative to these matches, which means that in the fail case the "real" regex engine never actually kicks in. By putting a character in the fixed string into a class the Boyer-Moore matching is bypassed and the slower process of trying the pattern at each offset in the string occurs which results in the (?{}) block executing.

As more and different optimisations get added to the regex engine you can expect further changes of this sort. Actually as a matter of fact Perl 5.9.2 will produce a different result for the ba[r] example in that it will only print "Got 'foo'" once and not three times.

---
demerphq