in reply to Is RegEx in Perl not NFA anymore?

First a quick side note from perlvar regarding your use of $&.

Traditionally in Perl, any use of any of the three variables $` , $& or $'... imposed a considerable performance penalty... so generally the use of these variables has been discouraged.

So it's probably better to use capturing parenthesis and just get in to the habit of not using the three variables above. That said, I was perplexed by the behavior shown for your first match attempt as well. I tried the following out of curiosity.

$_ = "The first recorded efforts to reach Everest's summit were made b +y British mountaineers "; /(summit|Everest|mountain)/; print "$1\n"; /(summit|mountain)/; print "$1\n"; /(Everest|mountain|summit)/; print "$1\n";

Which gave the following output.

Everest summit Everest

This indicated to me that the order of the alternatives separated by | likely doesn't matter. Taking dasgar's other advice and trying Regexp::Debugger shows exactly what the regex engine is doing. Stepping through character by character, starting from the left, and looking right from there to see if any of the | separated alternatives have a complete match. It does check the leftmost alternative first, but it doesn't go all the way through the string with it before checking the next alternative. So in other words, "Everest" gets matched because it comes first (is leftmost) in the string.

Unfortunately, I don't know if perl's regex engine is NFA or DFA (or some hybrid), or even what the difference is since this is the first time I've heard those terms used.

UPDATE:

Down a rabbit Google hole perldigious went. Back he came with more questions and no answers... and also a bit of a headache.

Hacker News
Russ Cox paper He is sort of scolding Perl, Python, and Ruby, but as far as I can tell he likes Go and spares it from the same analysis... hmm.
ars technica
StackOverflow

I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious

Replies are listed 'Best First'.
Re^2: Is RegEx in Perl not NFA anymore?
by AnomalousMonk (Archbishop) on Oct 19, 2016 at 21:04 UTC
    ... the order of the alternatives separated by | likely doesn't matter.

    Further to Laurent_R's ++post: The ordered alternation "first match wins" behavior of Perl 5 can be seen here:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'xxxABCDEyyy'; ;; print qq{captured '$1'} if $s =~ m{ (ABC|ABCD|ABCDE) }xms; " captured 'ABC'


    Give a man a fish:  <%-{-{-{-<

      Yes, AnomalousMonk, that's exactly it.

      Just for the sake of completeness, these are the two variants of alternations in Perl 6, adapting your example:

      > my $s = 'xxxABCDEyyy'; xxxABCDEyyy > say "captured $/" if $s ~~ / ABC | ABCD | ABCDE /; captured ABCDE > say "captured $/" if $s ~~ / ABC || ABCD || ABCDE /; captured ABC
      With a single pipe, the longest match wins (which means, BTW, that the engine must try all possibilities to figure out which is the best); with a double pipe, the first match wins.

      But that holds only if all possible matches start on the same atom:

      > say "captured $/" if $s ~~ / xABC | ABCD | ABCDE /; captured xABC
      Here, even though "ADCDE" is a longer match, "xABC" wins because it starts earlier in the string.

      Indeed, thanks AnomalousMonk, that was poor wording on my part. I should have added an "in this case" or something similar at the end since I meant that statement to apply to the specific alternatives being checked for a match against the specific string given, not a general statement to say "the order of such alternatives never matters". I do go on to explain it checks the leftmost alternative first.

      For some reason I originally thought it would drag the first (leftmost) alternative through the entire string before trying the second alternative and so on. Probably a silly thing to have ever thought, especially when considering groups of alternatives contained within more complicated regular expressions.

      I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
      I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious