metty has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I cannot figure out why my variable looses its content after applying a regular expression on it. The program is running in a mod_perl environment using Perl 5.14.2. The following code snippet is part of an object method. An interesting fact: sometimes its working, sometimes its not.

if (defined($$l_Name_ref)) { print STDERR "1 $$l_Name_ref\n" if (defined($$l_Name_ref)); print STDERR "2 $l_Name_ref\n"; if ($$l_Name_ref =~ /^[a-zA-Z]\w+$/) { print STDERR "3 $l_Name_ref\n"; print STDERR "4 $$l_Name_ref\n"; ...

Here is the output from the apache log files

1 MMP_MODULES 2 SCALAR(0x7f80ab361968) 3 SCALAR(0x7f80ab361968) 4 MMP_MODULES 1 TMP_MODULE_HASH 2 SCALAR(0x7f80ad04ed68) 3 SCALAR(0x7f80ad04ed68)
Use of uninitialized value in concatenation (.) or string at myfile.pm + line 1061.

The error message indicates the line storing the print command with number 4.

During the first run, referencing the value 'MMP_MODULES', the program is working fine. The debug code send to STDERR, which is passed to the Apache log files, prints the expected result. During its second run, using the value 'TMP_MODULE_HASH', the regular expression is validated fine, but the referenced value gets undefined. I printed the pointer value. Interesting enough, the pointer to the memory block does not get modified. Just the value gets lost. Any idea why?

I can fix this by creating a local copy before applying the regular expression:

if (defined($$l_Name_ref)) { my $l_Name = $$l_Name_ref; if ($l_Name =~ /^[a-zA-Z]\w+$/) { ...

Using this code, it is always working. As the code is in a central part of my program, which is executed hundreds of times during a web page call, I would like to avoid copying a value without a need.

How can a regular expression check modify the content of a variable?

Replies are listed 'Best First'.
Re: Loosing variable content after regular expression
by Eily (Monsignor) on Dec 18, 2014 at 16:49 UTC

    There's nothing wrong with the piece of code you have provided (except that it does nothing and does not even compile because it's incomplete). To get an answer, you should have tried to obtain a minimal reproducting exemple, which would probably have shown you what was wrong in the process.

    The issue probably comes from where you get your reference, if you have $ref = \SOMETHING where SOMETHING comes from a regex (the most obvious case being $ref = \$1;, running another match may change the value your reference points to. For exemple, the following code has a issue similar to yours, except it fails at the first run:

    my $l_Name_ref; for (qw/HELLO PERL MONKS/) { /(.+)/; $l_Name_ref = \$1; if (defined($$l_Name_ref)) { print STDERR "1 $$l_Name_ref\n" if (defined($$l_Name_ref)); print STDERR "2 $l_Name_ref\n"; if ($$l_Name_ref =~ /^[a-zA-Z]\w+$/) { print STDERR "3 $l_Name_ref\n"; print STDERR "4 $$l_Name_ref\n"; } } }

    Edit: I changed my example, in the previous version I used @ARGV and shifted from it because of my first attempt at reproducing the defect but it did not make much sense in this last version.

      Hello Eily, you wrote:

      ... if you have $ref = \SOMETHING where SOMETHING comes from a regex ..., running another match may change the value your reference points to.

      Whow, I am impressed. Your guess is absolutely right. In fact the input to the method is coming from another regular expression. This is how I call my method:

      $l_Value =~ s/([a-zA-Z]\w+)/${$o_Object->GetValue(\ $1)}/gs;

      The regular expression searches for specific placeholders in a string, and tries to replace them with values returned by the method "GetValue".

      sub GetValue { my $o_Object = shift; my $l_Name_ref = shift; my $l_Value = undef; if (defined($$l_Name_ref)) { print STDERR "1 $$l_Name_ref\n" if (defined($$l_Name_ref)); print STDERR "2 $l_Name_ref\n"; if ($$l_Name_ref =~ /^[a-zA-Z]\w+$/) { print STDERR "3 $l_Name_ref\n"; print STDERR "4 $$l_Name_ref\n"; ... } else { ... } return \ $l_Value; }

      So running a regular expression on a reference, which point to a result of another regular expression should be avoided. The solution is to create a copy of the same data before running the next regular expression. Thank you for pointing this out.

Re: Loosing variable content after regular expression
by Corion (Patriarch) on Dec 19, 2014 at 08:21 UTC

    Anonymous Monk is right, this most likely has something to do with what $l_Name_ref refers to. Most likely, this is a capture variable that then gets reset by the next match operation:

    #!perl -w sub foo { my( $ref )= @_; print "Before match: <$$ref>\n"; 'barring something else' =~ /bar/; # reset all capture variables print "After match: <$$ref>\n"; }; 'blargh foo blargh' =~ /(foo)/; foo( \$1 ); print "Attempt 2:\n"; 'blargh foo blargh' =~ /(foo)/; foo( \"$1" ); __END__ Before match: <foo> Use of uninitialized value in concatenation (.) or string at q:\tmp.pl + line 7. After match: <> Before match: <foo> After match: <foo>

    I doubt that the intent is really to pass around a reference to $1, as that has the described side effects. It's likely better to pass a reference to a copy of $1, as in Attempt 2.

      Excellent explanation! You can see the buggy code causing my problem in an answer above. Your example explains this phenomena short and precise. Thank you.

        In general, it's a Good Idea to "capture the capture variables" immediately after a match in order to avoid these kinds of strange problems:

        if ($string =~ m{ (foo|bar) (\d+) (w[ia]g) }xms) { my ($one, $two, $three) = ($1, $2, $3); do_something_with($one); ... }
        or:
        if (my ($one, $two, $three) = $string =~ m{ (foo|bar) (\d+) (w[ia]g) } +xms) { do_something_with($one); ... }

Re: Loosing variable content after regular expression
by Anonymous Monk on Dec 18, 2014 at 19:48 UTC
    It has to be an issue with "what the reference actually refers to," because it is known that the =~ operator does not change anything. A subtle bug in your code, to be sure, but a bug in your code, which might not be in the snippet that you included here.