in reply to Is RegEx in Perl not NFA anymore?
First a quick side note from perlvar regarding your use of $&.
Traditionally in Perl, any use of any of the three variables $` , $& or $'... imposed a considerable performance penalty... so generally the use of these variables has been discouraged.
So it's probably better to use capturing parenthesis and just get in to the habit of not using the three variables above. That said, I was perplexed by the behavior shown for your first match attempt as well. I tried the following out of curiosity.
$_ = "The first recorded efforts to reach Everest's summit were made b +y British mountaineers "; /(summit|Everest|mountain)/; print "$1\n"; /(summit|mountain)/; print "$1\n"; /(Everest|mountain|summit)/; print "$1\n";
Which gave the following output.
Everest summit Everest
This indicated to me that the order of the alternatives separated by | likely doesn't matter. Taking dasgar's other advice and trying Regexp::Debugger shows exactly what the regex engine is doing. Stepping through character by character, starting from the left, and looking right from there to see if any of the | separated alternatives have a complete match. It does check the leftmost alternative first, but it doesn't go all the way through the string with it before checking the next alternative. So in other words, "Everest" gets matched because it comes first (is leftmost) in the string.
Unfortunately, I don't know if perl's regex engine is NFA or DFA (or some hybrid), or even what the difference is since this is the first time I've heard those terms used.
UPDATE:
Down a rabbit Google hole perldigious went. Back he came with more questions and no answers... and also a bit of a headache.
Hacker News
Russ Cox paper He is sort of scolding Perl, Python, and Ruby, but as far as I can tell he likes Go and spares it from the same analysis... hmm.
ars technica
StackOverflow
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Is RegEx in Perl not NFA anymore?
by AnomalousMonk (Archbishop) on Oct 19, 2016 at 21:04 UTC | |
by Laurent_R (Canon) on Oct 20, 2016 at 13:33 UTC | |
by perldigious (Priest) on Oct 19, 2016 at 21:32 UTC |