Source Divers: @x{a} vs @$x{a}

Replies are listed 'Best First'.
Re: Source Divers: @x{a} vs @$x{a} by Zaxo (Archbishop) on Jul 11, 2002 at 05:38 UTC
Ok. Fooling around, I found that `@x{('a')}`, and `@x{'a',}` do not trigger the warning either, so I'm adding them to the question. Here is a test script which shows the behavior: `#!/usr/bin/perl -w use strict; my %x; @x{('a', 'b', 'c')} = (4, 5, 6); my $x = \%x; print @$x{'a'}, $/; print @x{'a'}, $/; print @x{('a')}, $/; print @x{'a',}, $/; exit 0; __END__ $ ./japhy.pl Scalar value @x{'a'} better written as $x{'a'} at ./japhy.pl line 7. 4 4 4 4` [download] A quick `grep -n "Scalar value" toke.c` yields only one candidate: 3566: "Scalar value %.s better written as $%.s", ( with some spaces trimmed ). Looks promising. Looking at the code, we find the function `int Perl_yylex(pTHX)`, an enormous case switch, and our 'Scalar value' message is sitting right at the end of `case '@':`. Here's the code, given a local numbering for reference later: 1: case '@': 2: if (PL_expect == XOPERATOR) 3: no_op("Array", s); 4: PL_tokenbuf[0] = '@'; 5: s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf + - 1, FALSE); 6: if (!PL_tokenbuf[1]) { 7: if (s == PL_bufend) 8: yyerror("Final @ should be \\@ or @name"); 9: PREREF('@'); 10: } 11: if (PL_lex_state == LEX_NORMAL) 12: s = skipspace(s); 13: if ((PL_expect != XREF \|\| PL_oldoldbufptr == PL_last_lop) && int +uit_more(s)) { 14: if (s == '{') 15: PL_tokenbuf[0] = '%'; 16: / Warn about @ where they meant $. / 17: if (ckWARN(WARN_SYNTAX)) { 18: if (s == '[' \|\| s == '{') { 19: char t = s + 1; 20: while (t && (isALNUM_lazy_if(t,UTF) \|\| strchr(" \t$ +#+-'\"", t))) 21: t++; 22: if (t == '}' \|\| t == ']') { 23: t++; 24: PL_bufptr = skipspace(PL_bufptr); 25: Perl_warner(aTHX_ packWARN(WARN_SYNTAX), 26: "Scalar value %.s better written as $%.s", 27: t-PL_bufptr, PL_bufptr, t-PL_bufptr-1, PL_bu +fptr+1); 28: } 29: } 30: } 31: } 32: PL_pending_ident = '@'; 33: TERM('@'); [download] The `char* s` pointer is clearly running the show. The `scan_ident()` function is called on the location right after the `@`, so it must be of primary interest. A quick `gid scan_ident` (from gnu id-utils, recommended) reveals that scan_ident is a macro for S_scan_ident, and that also lives in toke.c. `S_scan_ident()` first looks for runs of digits, than alphabetics, then we find: `if (s == '$' && s[1] && (isALNUM_lazy_if(s+1,UTF) \|\| strchr("${", s[1]) \|\| strnEQ(s+1,"::" +,2)) ) { return s; }` [download] The appearance of a dollar sign with an alphanumeric string afterwards immediately returns a pointer to the dollar sign. If, instead, the `@` is followud by identifier characters, they are copied to a buffer, and a tricky bit of pointer magic simultaneously writes the terminating NUL and determines whether the destination pointer had advanced: `d = '\0'; d = dest; if (d) { if (PL_lex_state != LEX_NORMAL) PL_lex_state = LEX_INTERPENDMAYBE; return s; }` [download] returning a pointer to the bracket following the identifier. If we arrive at line 6 with `PL_tokenbuf[1]` set to zero, and s pointing to '$', we hit `PREREF('@')`, a macro from near the top of toke.c which sets `PL_expect` to `XREF`, updates a global token pointer to s, and returns '@'. From here on, token processing will be governed by '$' of `$x`, with `PL_expect` knowing to find a reference. If instead we have '@x\0' in the token buffer, and s pointing to '{', we hit line 13, which I'll slack by assuming it's true. We take the branch, another for warnings on, and another matching the bracket, and we find ourselves at line 20, with `char t` looking ahead at the hash key argument. The while condition advancing t is stopped before the closing bracket if a parenthesis or comma is found, making the argument look like a list, but a single identifier runs all the way to the right bracket, triggering the warning. The difference appears to be an inconsistency in the treatment of '@' parsing and $ parsing for references, but I haven't fathomed what the correct behavior is, or how to reconcile tham. I had a lot of fun with this, ++japhy for motivating my first hard look at the blinding source :) After Compline, Zaxo	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Source Divers: @x{a} vs @$x{a}
by Zaxo (Archbishop) on Jul 11, 2002 at 05:38 UTC

Ok.

Fooling around, I found that @x{('a')}, and @x{'a',} do not trigger the warning either, so I'm adding them to the question. Here is a test script which shows the behavior:

#!/usr/bin/perl -w
use strict;
my %x;
@x{('a', 'b', 'c')} = (4, 5, 6);
my $x = \%x;
print @$x{'a'}, $/;
print @x{'a'}, $/;
print @x{('a')}, $/;
print @x{'a',}, $/;
exit 0;
__END__
$ ./japhy.pl
Scalar value @x{'a'} better written as $x{'a'} at ./japhy.pl line 7.
4
4
4
4
[download]

grep -n "Scalar value" toke.c

3566:             "Scalar value %.*s better written as $%.*s",

int Perl_yylex(pTHX)

case '@':

1: case '@':
2:    if (PL_expect == XOPERATOR)
3:        no_op("Array", s);
4:    PL_tokenbuf[0] = '@';
5:    s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof PL_tokenbuf
+ - 1, FALSE);
6:    if (!PL_tokenbuf[1]) {
7:        if (s == PL_bufend)
8:            yyerror("Final @ should be \\@ or @name");
9:        PREREF('@');
10:   }
11:   if (PL_lex_state == LEX_NORMAL)
12:       s = skipspace(s);
13:   if ((PL_expect != XREF || PL_oldoldbufptr == PL_last_lop) && int
+uit_more(s)) {
14:       if (*s == '{')
15:           PL_tokenbuf[0] = '%';
16:       /* Warn about @ where they meant $. */
17:       if (ckWARN(WARN_SYNTAX)) {
18:          if (*s == '[' || *s == '{') {
19:               char *t = s + 1;
20:               while (*t && (isALNUM_lazy_if(t,UTF) || strchr(" \t$
+#+-'\"", *t)))
21:                   t++;
22:               if (*t == '}' || *t == ']') {
23:                   t++;
24:                   PL_bufptr = skipspace(PL_bufptr);
25:                   Perl_warner(aTHX_ packWARN(WARN_SYNTAX),
26:                       "Scalar value %.*s better written as $%.*s",
27:                       t-PL_bufptr, PL_bufptr, t-PL_bufptr-1, PL_bu
+fptr+1);
28:               }
29:           }
30:       }
31:   }
32:   PL_pending_ident = '@';
33:   TERM('@');
[download]

char* s

The scan_ident() function is called on the location right after the @, so it must be of primary interest. A quick gid scan_ident (from gnu id-utils, recommended) reveals that scan_ident is a macro for S_scan_ident, and that also lives in toke.c.

S_scan_ident() first looks for runs of digits, than alphabetics, then we find:

if (*s == '$' && s[1] &&
    (isALNUM_lazy_if(s+1,UTF) || strchr("${", s[1]) || strnEQ(s+1,"::"
+,2)) )
{
    return s;
}
[download]

If, instead, the @ is followud by identifier characters, they are copied to a buffer, and a tricky bit of pointer magic simultaneously writes the terminating NUL and determines whether the destination pointer had advanced:

    *d = '\0';
    d = dest;
    if (*d) {
        if (PL_lex_state != LEX_NORMAL)
            PL_lex_state = LEX_INTERPENDMAYBE;
        return s;
    }
[download]

If we arrive at line 6 with PL_tokenbuf[1] set to zero, and s pointing to '$', we hit PREREF('@'), a macro from near the top of toke.c which sets PL_expect to XREF, updates a global token pointer to s, and returns '@'. From here on, token processing will be governed by '$' of $x, with PL_expect knowing to find a reference.

If instead we have '@x\0' in the token buffer, and s pointing to '{', we hit line 13, which I'll slack by assuming it's true. We take the branch, another for warnings on, and another matching the bracket, and we find ourselves at line 20, with char *t looking ahead at the hash key argument. The while condition advancing t is stopped before the closing bracket if a parenthesis or comma is found, making the argument look like a list, but a single identifier runs all the way to the right bracket, triggering the warning.

The difference appears to be an inconsistency in the treatment of '@' parsing and $ parsing for references, but I haven't fathomed what the correct behavior is, or how to reconcile tham.

I had a lot of fun with this, ++japhy for motivating my first hard look at the blinding source :)

After Compline,
Zaxo

[reply]
[d/l]
[select]