While this isn't much of a logical leap; to realize that in order for $1 to contain useful data a match must have occurred, many beginners just don't think of it, and that oversight leads to warnings and sometimes difficult to locate bugs in their code.
The Perl PODs provide several documents that deal with Regular Expressions. The primary documents, in order of how they should probably be read by a beginner are: perlrequick, perlretut, and perlre.
What surprised me when I recently re-read these docs is that none of those three primary regex documents warn against using $1 without checking for a match. The closest thing I could find was in perlre,
The numbered variables ($1, $2, $3, etc.) and the related punctuation set ($+, $&, $`, $', and $^N) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See perlsyn/"Compound Statements".)
perlvar is only slightly more helpful with,
$<digit> Contains the subpattern from the corresponding set of capturing parentheses from the last pattern match, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digits.) These variables are all read-only and dynamically scoped to the current BLOCK.
Again, someone with a little experience is going to read between the lines and know that if there isn't a match, there's nothing to see in $1. The hints are all there. But it might be more explicit if one of the aformentioned docs simply stated, "The value of the capturing variables is undef in the event that there is no match, or the value of the most recent capture resulting from a match."
And then there's perlfaq6: Silent on the issue.
This being an issue that manifests itself with great regularity, I'm surprised that there isn't any mention that if a match fails the $<digit> variables will be undefined. And taking it one step further, it would be nice if somewhere in the docs mention were made that one should test for a match before relying on the $<digit> variables.
Perl gives people enough rope to tie impressive knots, or to hang themselves. And the docs can't enumerate every possible way in which people can hang themselves. But this happens to be a pretty common noose.
Update: Ok, this isn't the place to submit a patch, but I'd appreciate comments on a proposed patch to perlre; the addition of the following text:
The value of the special capturing variables will be undef in the event of no match within current scope, or the value of the most recent successful match's capture in the current scope even if there has been a subsequent failed match.
Is this accurate? Clear? I know that this isn't the place to submit a patch, but before I do submit one, I'd like a weigh-in on what might constitute clear and accurate verbiage.
Dave
|
|---|