AFAIK at least some of the regex expressions you mean will always be marked "experimental" in some way or another. The reason is that the interactions of the "normal" regex engine, the optimiser and run-time optimisation vary with different versions of perl and as these regops allow one to "see through" to the behaviour of the engine itself their behaviour becomes not well defined. The experimental marking is there so that people know the results of using them will vary with perl to perl.

A good example would be the following two snippets which should be functionally equivelent but don't do the same thing because the code block violates the encapsulation of the regex engines behaviour.

E:\>perl -le "print 'foo baz'=~/(foo|foo|foo)\s(?{print qq(Got '$1')}) +ba[r]/ ? qq(Matches '$1') : 'No match!'" Got 'foo' Got 'foo' Got 'foo' No match! E:\>perl -le "print 'foo baz'=~/(foo|foo|foo)\s(?{print qq(Got '$1')}) +bar/ ? qq(Matches '$1') : 'No match!'" No match!

The reason this happens is that the tail part of the regex (the 'bar' part) allows the regex engine to use Boyer-Moore matching as an optimisation to find each 'bar' in the string being searched and then start the match process from relative to these matches, which means that in the fail case the "real" regex engine never actually kicks in. By putting a character in the fixed string into a class the Boyer-Moore matching is bypassed and the slower process of trying the pattern at each offset in the string occurs which results in the (?{}) block executing.

As more and different optimisations get added to the regex engine you can expect further changes of this sort. Actually as a matter of fact Perl 5.9.2 will produce a different result for the ba[r] example in that it will only print "Got 'foo'" once and not three times.

---
demerphq


In reply to Re^2: What's broken in Perl 5? by demerphq
in thread What's broken in Perl 5? by tlm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.