frasco has asked for the wisdom of the Perl Monks concerning the following question:

dear monks, I'm working on some text files in which the following pattern -=AB (for example) has to be marked with bold tags (for an html page). Moreover, the pattern may or not to be enclosed by square brackets. Thus I wrote the following simple regexp:

$line =~ s/-=(A)(\[|\])?(B)(\[|\])?/<b>$1$2$3<\/b>$4/g;

Well, it works fine, but the message "Use of uninitialized value in concatenation (.) or string ..." appears in the apache error log file. It, as far as I understand, depends on the fact that the backreference fields value are undefined whenever the square brackets miss into the pattern.
Of course, this really isn't good programming style but, that's my question, it is a real problem? Sould I correct it?

Replies are listed 'Best First'.
Re: undefined backreferences
by jwkrahn (Abbot) on Jan 24, 2009 at 18:39 UTC

    Put the quantifier inside the parentheses instead of outside the parentheses:

    $line =~ s/-=(A)([][]?)(B)([][]?)/<b>$1$2$3<\/b>$4/g;
Re: undefined backreferences
by davido (Cardinal) on Jan 25, 2009 at 06:31 UTC

    I actually fiddled around with a substitution regexp the other day in this node. The regexp looked like this:

    $pad =~ s/(?:z(.)(?:z.)+)|(?:.(.))/$1$2/g;

    Because of the alternation, either $1, or $2 would be defined, and the other would be undefined, or uninitialized. The result was that I kept getting the same warning you're trying to avoid. My solution was to turn off the specific warning within the narrowest scope possible. This is easy, as the "use warnings" and "no warnings" pragmas can be lexically scoped. For example:

    use strict; use warnings; my $pad = "whatever..."; { no warnings qw/uninitialized/; $pad =~ s/(?:z(.)(?:z.)+)|(?:.(.))/$1$2/g; } # here, all warnings are active again.

    Within the curly brackets the "uninitialized" warning is silenced, but throughout the rest of the code it's still active.


    Dave

      Thank you guys
      I followed the method proposed by jwkrahn. Thus, if I well understand, by putting the quantifier inside the parentheses the backreference is set to 0 or 1, whereas by using the ? character outside the parentheses means that the pattern optionally includes the preceding expression. If I well understand again, in the fisrt case there's no concern to the possibility that the backreference has an "undefined value" (it will be 0 or 1). Is not it?
      On the contrary I don't like the method proposed by davido (thank you anyway); even though it looks functional, I do not believe that to set warnings off is the right way, even within the narrowest scope possible.
      I'm almost new to Perl, so maybe I'm in wrong. Thank you again

        Yes, you are right. In your specific situation, the issue you were having can be cleared up by eliminating unnecessary capturing, which is what the (?:....) construct does. That's the optimal solution. But the implications of your question are what interested me, and why I responded with the example of how to turn off warnings. I was interested in responding to the question of what to do when you can't easily redesign the regular expression -- perhaps where the redesign would add unwanted complexity, for example.

        My explanation was intended to explain one way to handle situations where an undefined backreference cannot be avoided simply by eliminating capturing. That is also a good trick to have up your sleeve. It is possible that you could have come to us with a regexp that couldn't have been easily simplified, in which case, just as in my example, another solution has to be found.

        The example I provided showed a situation where an undefined backreference was an intentional component of the script's design, and couldn't be eliminated without unnecessarily complicating the script. In that sort of a situation, the warning generated is expected but unsightly. Since we know why we're getting the warning, and we know in the specific example that I provided that the warning isn't important, it's perfectly fine to simply shut it off in the narrowest possible scope. Warnings exist to let us know we might have just made a mistake in our code. But sometimes our judgment has to trump the warning.

        Think of the warnings as being similar to your car's navigation system. It might tell you to take exit 15B, but you may know something the nav system doesn't; that there's road work at 15B. So you take exit 14 instead. The GPS complains, and you press the mute button, or the detour button to silence it. And you end up getting home faster, having avoided the construction delay. Lexical scoping for warnings exists for exactly this reason. There are times where you, the programmer, want to do something that usually would be construed as a mistake, but in your particular case happens to be exactly what you need and intend to do. When this happens, press mute (shut off the warning, in the narrowest possible scope).

        Thanks again for the question: I love these ones that generate some food for thought.


        Dave