This is a source-diver challenge, designed to get you better acquainted with Perl's source. If you don't have the source, you might want to get it.

The file in question is toke.c. Your job is to figure out why (and where) Perl warns about @x{key} but not @$x{key}, when both are examples of single-element hash slices. The warning raised is "Scalar value @... better written as $...".

Have fun. Use the Source, Luke.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Source Divers: @x{a} vs @$x{a}
by Zaxo (Archbishop) on Jul 11, 2002 at 05:38 UTC

    Ok.

    Fooling around, I found that @x{('a')}, and @x{'a',} do not trigger the warning either, so I'm adding them to the question. Here is a test script which shows the behavior:

    #!/usr/bin/perl -w use strict; my %x; @x{('a', 'b', 'c')} = (4, 5, 6); my $x = \%x; print @$x{'a'}, $/; print @x{'a'}, $/; print @x{('a')}, $/; print @x{'a',}, $/; exit 0; __END__ $ ./japhy.pl Scalar value @x{'a'} better written as $x{'a'} at ./japhy.pl line 7. 4 4 4 4
    A quick grep -n "Scalar value" toke.c yields only one candidate:
    3566:             "Scalar value %.*s better written as $%.*s",
    
    ( with some spaces trimmed ). Looks promising. Looking at the code, we find the function int Perl_yylex(pTHX), an enormous case switch, and our 'Scalar value' message is sitting right at the end of case '@':. Here's the code, given a local numbering for reference later:
    1: case '@': 2: if (PL_expect == XOPERATOR) 3: no_op("Array", s); 4: PL_tokenbuf[0] = '@'; 5: s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf + - 1, FALSE); 6: if (!PL_tokenbuf[1]) { 7: if (s == PL_bufend) 8: yyerror("Final @ should be \\@ or @name"); 9: PREREF('@'); 10: } 11: if (PL_lex_state == LEX_NORMAL) 12: s = skipspace(s); 13: if ((PL_expect != XREF || PL_oldoldbufptr == PL_last_lop) && int +uit_more(s)) { 14: if (*s == '{') 15: PL_tokenbuf[0] = '%'; 16: /* Warn about @ where they meant $. */ 17: if (ckWARN(WARN_SYNTAX)) { 18: if (*s == '[' || *s == '{') { 19: char *t = s + 1; 20: while (*t && (isALNUM_lazy_if(t,UTF) || strchr(" \t$ +#+-'\"", *t))) 21: t++; 22: if (*t == '}' || *t == ']') { 23: t++; 24: PL_bufptr = skipspace(PL_bufptr); 25: Perl_warner(aTHX_ packWARN(WARN_SYNTAX), 26: "Scalar value %.*s better written as $%.*s", 27: t-PL_bufptr, PL_bufptr, t-PL_bufptr-1, PL_bu +fptr+1); 28: } 29: } 30: } 31: } 32: PL_pending_ident = '@'; 33: TERM('@');
    The char* s pointer is clearly running the show.

    The scan_ident() function is called on the location right after the @, so it must be of primary interest. A quick gid scan_ident (from gnu id-utils, recommended) reveals that scan_ident is a macro for S_scan_ident, and that also lives in toke.c.

    S_scan_ident() first looks for runs of digits, than alphabetics, then we find:

    if (*s == '$' && s[1] && (isALNUM_lazy_if(s+1,UTF) || strchr("${", s[1]) || strnEQ(s+1,"::" +,2)) ) { return s; }
    The appearance of a dollar sign with an alphanumeric string afterwards immediately returns a pointer to the dollar sign.

    If, instead, the @ is followud by identifier characters, they are copied to a buffer, and a tricky bit of pointer magic simultaneously writes the terminating NUL and determines whether the destination pointer had advanced:

    *d = '\0'; d = dest; if (*d) { if (PL_lex_state != LEX_NORMAL) PL_lex_state = LEX_INTERPENDMAYBE; return s; }
    returning a pointer to the bracket following the identifier.

    If we arrive at line 6 with PL_tokenbuf[1] set to zero, and s pointing to '$', we hit PREREF('@'), a macro from near the top of toke.c which sets PL_expect to XREF, updates a global token pointer to s, and returns '@'. From here on, token processing will be governed by '$' of $x, with PL_expect knowing to find a reference.

    If instead we have '@x\0' in the token buffer, and s pointing to '{', we hit line 13, which I'll slack by assuming it's true. We take the branch, another for warnings on, and another matching the bracket, and we find ourselves at line 20, with char *t looking ahead at the hash key argument. The while condition advancing t is stopped before the closing bracket if a parenthesis or comma is found, making the argument look like a list, but a single identifier runs all the way to the right bracket, triggering the warning.

    The difference appears to be an inconsistency in the treatment of '@' parsing and $ parsing for references, but I haven't fathomed what the correct behavior is, or how to reconcile tham.

    I had a lot of fun with this, ++japhy for motivating my first hard look at the blinding source :)

    After Compline,
    Zaxo