davido has asked for the wisdom of the Perl Monks concerning the following question:

Once in awhile Perl still surprises me.

This code is obviously broken:

my ($left, $right) = qw(abc def); print "$left_$right\n";

The problem is that interpolation causes Perl to want to print a variable $left_ and $right, but we declared $left and $right. There are several ways to fix this, two of which are:

print "$left\_$right\n"; print "${left}$right\n";

...and of course you could just use the concatenation operator, but then I wouldn't have anything to puzzle over.

But observe the following:

my $string = "abcd"; if ($string =~ m/(ab)(cd)/) { print "$1_$2\n"; }

The output is ab_cd. So in this case Perl automatically treated that interpolation as "${1}_$2\n" without me telling it to do so. I've looked over The Gory details of parsing quoted constructs and haven't found an explanation. I assume that in the case of numbered regex variables, Perl decides that if the variable starts with a numeric digit, the identifier must end when there are no more numeric digits. Is this behavior reliable? Is it documented? Is it likely to ever change?

I'm asking because I found it in a code review and was sure it was broken until we talked it over and tested to verify the behavior was to parse "$1_$2\n" as $1 . '_' . $2 . "\n" even though we would prefer to disambiguate using \ or ${n}.

Update: I do see in perldata: Identifier-parsing:

Meanwhile, special identifiers don't follow the above rules; For the most part, all of the identifiers in this category have a special meaning given by Perl. Because they have special parsing rules, these generally can't be fully-qualified. They come in six forms (but don't use forms 5 and 6):

  1. A sigil, followed solely by digits matching \p{POSIX_Digit}, like $0, $1, or $10000.

I don't know that this is worded quite right, because $1_ could be construed as NOT being an identifier consisting solely of digits. But it's the closest thing I can find to an explanation. But I'll take that as answering my own question: Yes, it's intentional and documented behavior.


Dave

  • Comment on How Perl decides where a variable ends and text starts: Match variables in string interpolation
  • Select or Download Code

Replies are listed 'Best First'.
Re: How Perl decides where a variable ends and text starts: Match variables in string interpolation
by LanX (Saint) on Jul 22, 2021 at 23:00 UTC
    Hi

    I'm pretty sure "regular" $identifiers can't start with a digit, so something like $1_ can't be legal anyway

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I was almost certain that I had seen a counter-example, but you are right Variable names.
      Bill

        There are symbolic refs, but you have to stay in the symbolic ref world. If you create ${'1_'} as a symbolic reference, you have to continue to refer to it symbolically only. Perl barfs if you try to use it as $1_ later on. Here's a working example, though:

        { no strict 'refs'; ${'1_'} = 100; print ${'1_'}, "\n"; }

        If you go on to dump the %main:: hash you'll see the package global $1_ does exist. Perl's syntax just doesn't support it as anything but a symbolic ref, I think.


        Dave