I find regular expressions in general to be the least intuitive, most surprising computer "language" I've encountered (technically, they're a Domain-Specific Embedded Language, a DSEL) that is actually intended to be practical and useful rather than merely obscure; for an example of the latter, see Brainfuck, the source code of which looks remarkably like a traditional "line noise" regex definition.

My favorite example of this counter-intuitiveness is the result of matching the simple regex  /(b*)/ against the string  'aaaaabbb':
    'aaaaabbb' =~ /(b*)/;
What will be matched and captured to  $1 and where will the match occur? Knowing that matching is, by default, "greedy" and matches as much as possible, one's first thought might be as mine has often been, that it will match/capture  'bbb' at offset 5 in the string. Contemplation of the "Leftmost, Longest" rule for regex matching would seem to support this initial idea: offset 5 is the leftmost position at which the most  'b' characters are found — all of them in fact.

A simple experiment shows we are deceived:

c:\@Work\Perl\monks>perl -wMstrict -le "print qq{matched '$1' at offset $-[1]} if 'aaaaabbb' =~ /(b*)/; " matched '' at offset 0
(The  @- array holds the offset of the start of each corresponding capture group match. See the Variables related to regular expressions section of perlvar.)
I leave it to you, gentle PerlJam2015, to ponder why this regex actually matches an empty string (no  'b' at all) located as far from any  'b' as it could possibly be. Also consider the simplest way one might alter the regex so as to actually capture something like what we were expecting from a location near where we were expecting it.

I do not show you these things in order to discourage you, but rather to steel you against the frustrations and perplexities that inevitably accompany the study and use of regular expressions.


Give a man a fish:  <%-(-(-(-<


In reply to Re: exist backreference variable list? by AnomalousMonk
in thread exist backreference variable list? by PerlJam2015

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.