Re: Source Divers: @x{a} vs @$x{a}

Ok.

Fooling around, I found that @x{('a')}, and @x{'a',} do not trigger the warning either, so I'm adding them to the question. Here is a test script which shows the behavior:

#!/usr/bin/perl -w
use strict;
my %x;
@x{('a', 'b', 'c')} = (4, 5, 6);
my $x = \%x;
print @$x{'a'}, $/;
print @x{'a'}, $/;
print @x{('a')}, $/;
print @x{'a',}, $/;
exit 0;
__END__
$ ./japhy.pl
Scalar value @x{'a'} better written as $x{'a'} at ./japhy.pl line 7.
4
4
4
4
[download]

A quick grep -n "Scalar value" toke.c yields only one candidate:

3566:             "Scalar value %.*s better written as $%.*s",

( with some spaces trimmed ). Looks promising. Looking at the code, we find the function int Perl_yylex(pTHX), an enormous case switch, and our 'Scalar value' message is sitting right at the end of case '@':. Here's the code, given a local numbering for reference later:

1: case '@':
2:    if (PL_expect == XOPERATOR)
3:        no_op("Array", s);
4:    PL_tokenbuf[0] = '@';
5:    s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf
+ - 1, FALSE);
6:    if (!PL_tokenbuf[1]) {
7:        if (s == PL_bufend)
8:            yyerror("Final @ should be \\@ or @name");
9:        PREREF('@');
10:   }
11:   if (PL_lex_state == LEX_NORMAL)
12:       s = skipspace(s);
13:   if ((PL_expect != XREF || PL_oldoldbufptr == PL_last_lop) && int
+uit_more(s)) {
14:       if (*s == '{')
15:           PL_tokenbuf[0] = '%';
16:       /* Warn about @ where they meant $. */
17:       if (ckWARN(WARN_SYNTAX)) {
18:          if (*s == '[' || *s == '{') {
19:               char *t = s + 1;
20:               while (*t && (isALNUM_lazy_if(t,UTF) || strchr(" \t$
+#+-'\"", *t)))
21:                   t++;
22:               if (*t == '}' || *t == ']') {
23:                   t++;
24:                   PL_bufptr = skipspace(PL_bufptr);
25:                   Perl_warner(aTHX_ packWARN(WARN_SYNTAX),
26:                       "Scalar value %.*s better written as $%.*s",
27:                       t-PL_bufptr, PL_bufptr, t-PL_bufptr-1, PL_bu
+fptr+1);
28:               }
29:           }
30:       }
31:   }
32:   PL_pending_ident = '@';
33:   TERM('@');
[download]

The char* s pointer is clearly running the show.

The scan_ident() function is called on the location right after the @, so it must be of primary interest. A quick gid scan_ident (from gnu id-utils, recommended) reveals that scan_ident is a macro for S_scan_ident, and that also lives in toke.c.

S_scan_ident() first looks for runs of digits, than alphabetics, then we find:

if (*s == '$' && s[1] &&
    (isALNUM_lazy_if(s+1,UTF) || strchr("${", s[1]) || strnEQ(s+1,"::"
+,2)) )
{
    return s;
}
[download]

The appearance of a dollar sign with an alphanumeric string afterwards immediately returns a pointer to the dollar sign.

If, instead, the @ is followud by identifier characters, they are copied to a buffer, and a tricky bit of pointer magic simultaneously writes the terminating NUL and determines whether the destination pointer had advanced:

    *d = '\0';
    d = dest;
    if (*d) {
        if (PL_lex_state != LEX_NORMAL)
            PL_lex_state = LEX_INTERPENDMAYBE;
        return s;
    }
[download]

returning a pointer to the bracket following the identifier.

If we arrive at line 6 with PL_tokenbuf[1] set to zero, and s pointing to '$', we hit PREREF('@'), a macro from near the top of toke.c which sets PL_expect to XREF, updates a global token pointer to s, and returns '@'. From here on, token processing will be governed by '$' of $x, with PL_expect knowing to find a reference.

If instead we have '@x\0' in the token buffer, and s pointing to '{', we hit line 13, which I'll slack by assuming it's true. We take the branch, another for warnings on, and another matching the bracket, and we find ourselves at line 20, with char *t looking ahead at the hash key argument. The while condition advancing t is stopped before the closing bracket if a parenthesis or comma is found, making the argument look like a list, but a single identifier runs all the way to the right bracket, triggering the warning.

The difference appears to be an inconsistency in the treatment of '@' parsing and $ parsing for references, but I haven't fathomed what the correct behavior is, or how to reconcile tham.

I had a lot of fun with this, ++japhy for motivating my first hard look at the blinding source :)

After Compline,
Zaxo

Comment on Re: Source Divers: @x{a} vs @$x{a} Select or Download Code