n00bInvariably, you'll hear answers like "regexes are left-most longest" and "you want [^#]* instead of .*?" and "start your regex with .*". But I don't think I've seen an answer (at least, it hasn't caught my eye like the way I'd expect it to) that says "you aren't anchoring your regex to the end of the string".
I have the regex /#(.*?)$/ and the string "this #is an #example string". I want to match "example string" with the regex, but for some reason it's matching "is an #example string" even though I've got a non-greedy .* and I've got it anchored to the end of the string.
Thus, this meditation. We often refer to \A \b \B \G \z \Z ^ $ as "anchors", but I feel this is a misnomer, a simplification of their real name (which would be something like "string location assertion") that gets in the way of their actual purpose. One could say that the regex /f$/ is anchored to the end of the string it matches, and one would be essentially right, since such a simple regex triggers a couple optimizations in the regex engine that result in the regex really meaning "look for an 'f' at the end of the string" rather than "find an 'f' followed by END-OF-STRING". Clearly, in a string with many f's, the optimized version is much smarter and faster.
But the regex /\s+$/ is not anchored (and this grieves me). I've used this example time and time again when explaining regex reversal and that whole tangent, and it's useful again. If the regex were really anchored (that is, immovable), it would find the end of the string, and then match the regex in that context -- that is, the regex would have to terminate at the end of the string. As it stands, Perl can't optimize the regex that way, and we end up matching EVERY chunk of whitespace, and then testing to see if END-OF-STRING comes after it.
So I think "anchor" is incorrect. But "string position assertion", while accurate, is clunky. So where do we go from here? I'm not sure. This is mainly me venting a frustration. Ideally, in Perl 6, we'll have real anchors and regexes that do what we mean.
Hi ho anchor, aweigh!
In reply to The "anchor" misnomer in regexes by japhy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |