There is a very common mistake made among newcomers to Perl and Perl's Regular Expressions. That is to trust the value of special matching variables such as $1 without verifying that a match succeeded.

While this isn't much of a logical leap; to realize that in order for $1 to contain useful data a match must have occurred, many beginners just don't think of it, and that oversight leads to warnings and sometimes difficult to locate bugs in their code.

The Perl PODs provide several documents that deal with Regular Expressions. The primary documents, in order of how they should probably be read by a beginner are: perlrequick, perlretut, and perlre.

What surprised me when I recently re-read these docs is that none of those three primary regex documents warn against using $1 without checking for a match. The closest thing I could find was in perlre,

The numbered variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See perlsyn/"Compound Statements".)

perlvar is only slightly more helpful with,

$<digit> Contains the subpattern from the corresponding set of capturing parentheses from the last pattern match, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digits.) These variables are all read-only and dynamically scoped to the current BLOCK.

Again, someone with a little experience is going to read between the lines and know that if there isn't a match, there's nothing to see in $1. The hints are all there. But it might be more explicit if one of the aformentioned docs simply stated, "The value of the capturing variables is undef in the event that there is no match, or the value of the most recent capture resulting from a match."

And then there's perlfaq6: Silent on the issue.

This being an issue that manifests itself with great regularity, I'm surprised that there isn't any mention that if a match fails the $<digit> variables will be undefined. And taking it one step further, it would be nice if somewhere in the docs mention were made that one should test for a match before relying on the $<digit> variables.

Perl gives people enough rope to tie impressive knots, or to hang themselves. And the docs can't enumerate every possible way in which people can hang themselves. But this happens to be a pretty common noose.

Update: Ok, this isn't the place to submit a patch, but I'd appreciate comments on a proposed patch to perlre; the addition of the following text:

The value of the special capturing variables will be undef in the event of no match within current scope, or the value of the most recent successful match's capture in the current scope even if there has been a subsequent failed match.

Is this accurate? Clear? I know that this isn't the place to submit a patch, but before I do submit one, I'd like a weigh-in on what might constitute clear and accurate verbiage.


Dave


In reply to Perl's POD's description of the use of capturing special variables. by davido

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.