comment on

Ok.

Fooling around, I found that @x{('a')}, and @x{'a',} do not trigger the warning either, so I'm adding them to the question. Here is a test script which shows the behavior:

#!/usr/bin/perl -w
use strict;
my %x;
@x{('a', 'b', 'c')} = (4, 5, 6);
my $x = \%x;
print @$x{'a'}, $/;
print @x{'a'}, $/;
print @x{('a')}, $/;
print @x{'a',}, $/;
exit 0;
__END__
$ ./japhy.pl
Scalar value @x{'a'} better written as $x{'a'} at ./japhy.pl line 7.
4
4
4
4
[download]

A quick grep -n "Scalar value" toke.c yields only one candidate:

3566:             "Scalar value %.*s better written as $%.*s",

( with some spaces trimmed ). Looks promising. Looking at the code, we find the function int Perl_yylex(pTHX), an enormous case switch, and our 'Scalar value' message is sitting right at the end of case '@':. Here's the code, given a local numbering for reference later:

1: case '@':
2:    if (PL_expect == XOPERATOR)
3:        no_op("Array", s);
4:    PL_tokenbuf[0] = '@';
5:    s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf
+ - 1, FALSE);
6:    if (!PL_tokenbuf[1]) {
7:        if (s == PL_bufend)
8:            yyerror("Final @ should be \\@ or @name");
9:        PREREF('@');
10:   }
11:   if (PL_lex_state == LEX_NORMAL)
12:       s = skipspace(s);
13:   if ((PL_expect != XREF || PL_oldoldbufptr == PL_last_lop) && int
+uit_more(s)) {
14:       if (*s == '{')
15:           PL_tokenbuf[0] = '%';
16:       /* Warn about @ where they meant $. */
17:       if (ckWARN(WARN_SYNTAX)) {
18:          if (*s == '[' || *s == '{') {
19:               char *t = s + 1;
20:               while (*t && (isALNUM_lazy_if(t,UTF) || strchr(" \t$
+#+-'\"", *t)))
21:                   t++;
22:               if (*t == '}' || *t == ']') {
23:                   t++;
24:                   PL_bufptr = skipspace(PL_bufptr);
25:                   Perl_warner(aTHX_ packWARN(WARN_SYNTAX),
26:                       "Scalar value %.*s better written as $%.*s",
27:                       t-PL_bufptr, PL_bufptr, t-PL_bufptr-1, PL_bu
+fptr+1);
28:               }
29:           }
30:       }
31:   }
32:   PL_pending_ident = '@';
33:   TERM('@');
[download]

The char* s pointer is clearly running the show.

The scan_ident() function is called on the location right after the @, so it must be of primary interest. A quick gid scan_ident (from gnu id-utils, recommended) reveals that scan_ident is a macro for S_scan_ident, and that also lives in toke.c.

S_scan_ident() first looks for runs of digits, than alphabetics, then we find:

if (*s == '$' && s[1] &&
    (isALNUM_lazy_if(s+1,UTF) || strchr("${", s[1]) || strnEQ(s+1,"::"
+,2)) )
{
    return s;
}
[download]

The appearance of a dollar sign with an alphanumeric string afterwards immediately returns a pointer to the dollar sign.

If, instead, the @ is followud by identifier characters, they are copied to a buffer, and a tricky bit of pointer magic simultaneously writes the terminating NUL and determines whether the destination pointer had advanced:

    *d = '\0';
    d = dest;
    if (*d) {
        if (PL_lex_state != LEX_NORMAL)
            PL_lex_state = LEX_INTERPENDMAYBE;
        return s;
    }
[download]

returning a pointer to the bracket following the identifier.

If we arrive at line 6 with PL_tokenbuf[1] set to zero, and s pointing to '$', we hit PREREF('@'), a macro from near the top of toke.c which sets PL_expect to XREF, updates a global token pointer to s, and returns '@'. From here on, token processing will be governed by '$' of $x, with PL_expect knowing to find a reference.

If instead we have '@x\0' in the token buffer, and s pointing to '{', we hit line 13, which I'll slack by assuming it's true. We take the branch, another for warnings on, and another matching the bracket, and we find ourselves at line 20, with char *t looking ahead at the hash key argument. The while condition advancing t is stopped before the closing bracket if a parenthesis or comma is found, making the argument look like a list, but a single identifier runs all the way to the right bracket, triggering the warning.

The difference appears to be an inconsistency in the treatment of '@' parsing and $ parsing for references, but I haven't fathomed what the correct behavior is, or how to reconcile tham.

I had a lot of fun with this, ++japhy for motivating my first hard look at the blinding source :)

After Compline,
Zaxo

In reply to Re: Source Divers: @x{a} vs @$x{a} by Zaxo
in thread Source Divers: @x{a} vs @$x{a} by japhy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.