http://qs1969.pair.com?node_id=221679

adrianh has asked for the wisdom of the Perl Monks concerning the following question:

Due too some of the discussion on Class::InsideOut I have discovered that I am missing some of the subtleties of perl's lexical pads, and their interaction with package-specific attributes.

Basically, lexical pads don't seem to be as localised as I thought they would be - and attribute subroutines seem to be being called in an odd context.

(Background: I am largely unfamiliar with the perl internals. I've skimmed perlguts and related docs but can't find anything obvious that answers my questions. However, I've worked on compilers for other languages and am familiar with the basic concepts.)

My naive concept of perl's lexical pads that there would be one associated with each lexical scope something like this.

# new pad here for the file's scope { # new pad here for the block's scope } sub foo { # new pad here for the subroutine's scope } ... etc ...

and from my reading of attributes in perl 5.8 doing.

my %foo : Bar = (answer => 42);

should be the same as

use attributes (); my %foo; attributes::->import(__PACKAGE__, \%z, 'Bent'); %foo = (answer => 42);

However, some experiments with PadWalker show my understanding is in error:

use strict; use warnings; package AttrTest; use Data::Dumper; use PadWalker qw(peek_my); sub MODIFY_HASH_ATTRIBUTES { my ($package, $reference, @attributes) = @_; $package->dump_lex("setting @attributes in $package for $reference +"); return; }; sub dump_lex { my ($class, $when) = @_; print "when $when top-level pad is\n"; my $level=0; while (eval {peek_my(++$level)} && !$@) {}; my $hash = peek_my($level-1); while (my ($name, $value) = each %$hash) { my $dumped = Data::Dumper->new([\$value],[$name])->Indent(0)-> +Dump; print "\t$value -> $dumped\n"; }; print "\n"; }; INIT { AttrTest->dump_lex('init') }; { package Foo; use base qw(AttrTest); my %foo : Attr = (one => 1); }; { package Bar; use base qw(AttrTest); use attributes (); my %bar; attributes::->import(__PACKAGE__, \%bar, 'Attr'); } AttrTest->dump_lex('runtime');

Under perl 5.6 this produces

when setting Attr in Foo for HASH(0x8187bec) top-level pad is + SCALAR(0x8186018) -> $%foo = \\'%foo'; + + when init top-level pad is + HASH(0x818bf68) -> $%bar = \{}; + HASH(0x8187bec) -> $%foo = \{}; + + when setting Attr in Bar for HASH(0x818bf68) top-level pad is + HASH(0x818bf68) -> $%bar = \{}; + + when runtime top-level pad is

Under perl 5.8 this produces

when init top-level pad is HASH(0x22dda8) -> $%bar = \{}; HASH(0x3cff4) -> $%foo = \{}; when setting Attr in Foo for HASH(0x3cff4) top-level pad is when setting Attr in Bar for HASH(0x22dda8) top-level pad is HASH(0x22dda8) -> $%bar = \{}; when runtime top-level pad is

Both examples used the same version of PadWalker (v0.08). I am also aware of the change in 5.8 that means attributes declared with my are now applied at runtime.

This output confuses me for a couple of reasons:

  • Why can INIT see %foo and %bar? They are not in the scope of the INIT subroutine and, since they are in their own blocks, they should not be visible from the file-scoped pad (the last call to dump_lex doesn't show them)?
  • Why can't the implicit call to MODIFY_HASH_ATTRIBUTES in Foo see %foo in perl 5.8? The attribute routine is being called at runtime so the hash should have been declared and be in the pad?

I realise that package-specific attributes are still considered experimental - so I'm not particularly worried about the different behaviour - but I would like to understand what's going on :-)

Can any perl internals guru's explain to this poor confused soul?

Replies are listed 'Best First'.
Re: Lexical pad / attribute confusion
by djantzen (Priest) on Dec 21, 2002 at 23:48 UTC

    If my understanding of Elian's explanations in Perl Internals - references and symbol table is correct, the reason is because %foo and %bar are just bare blocks, and as such do not get their own pads. Rather, they share the pad of the enclosing sub, which is that of the package, and therefore the INIT block can see them as well when peek_my inspects the package's pad.

      Hmm. Okay. What Elian said makes sense, but it leads to a new question.

      If %foo amd %bar are in the same pad, just finagled into different slots, why can't that last call to dump_lex see them?

        Because they are declared after dump_lex.

        -Lee

        "To be civilized is to deny one's nature."
Re: Lexical pad / attribute confusion
by Aristotle (Chancellor) on Dec 22, 2002 at 00:57 UTC

    The following is mostly an educated guess, as I have not actually grokked the guts first hand.

    It seems that both of these effects hinge on the duality of my having effects both at compile as well as runtime. Sprinkle a couple BEGIN { AttrTest->dump_lex('begin(1)') } in there and it becomes obvious that the pad gets fully populated during compilation, with all variables ever associated with the current pad being visible regardless of scope. In contrast, at runtime, the pad only contains those variables which are in scope - in fact, a variable doesn't appear in the pad before the entire statement during which it is declared has executed. It's for that reason that you can write my $x = 10; { my $x = $x; print $x } and get 10.

    End educated guess.

    Now with that, on to address each piece of the puzzle:

    1. INIT blocks are called after compilation has succeeded but before it is terminated. Therefor, they can see everything that's ever going to be in the pad they're associated with.
    2. In 5.6 the implicitly called MODIFY_HASH_ATTRIBUTES can see %foo because it is called during INIT - so the same rules as above apply.
    3. In 5.8 it can't, because the statement has not yet fully executed so the entry on the pad is not there yet.
    4. You're invoking the last dump_lex at runtime, outside the scope of either hash, so neither of them is visible in the pad.
    I may well be wrong, but that picture seems to be seamless.

    Makeshifts last the longest.

      It's worth noting as well that CHECK and END blocks give the same results as we see in INIT. So in some way those portions of a program's lifespan are privileged. Also, whether the variables have attributes is incidental to whether peek_my can see them at runtime, for example,

      { package Baz; my %baz; }

      likewise only shows up when dump_lex is called from the special execution blocks.

      Now, a standalone my %quux is visible no matter where dump_lex is called from. This makes me think that perhaps there is loophole in PadWalker, and that there is still a way to keep your lexicals private; namely, by relying on perl's runtime enforcement of scope which I guess is just the same mechanism that makes it possible to write:

      sub foo { my $foo; { my $foo; } }

      That is to say, at runtime perl knows to differentiate the two $foos based upon where they are declared, despite being written on the same pad.

      Update: Duh. This isn't a loophole, but rather the documented, intended behavior. From the docs: It will only show those variables which are in scope at the point of the call. I must have read that line 10 times and it only made sense when I was about to step in the shower :)

        Yes, that part is just stuck into the opcode tree. This turns padsv calls from references to $foo into "the scalar located at location 'blah' in the pad". This is very clear if you start peeking at your code with B::Concise.


        Fun Fun Fun in the Fluffy Chair

      It's for that reason that you can write my $x = 10; { my $x = $x; print $x } and get 10.

      Ah. This makes the behaviour of the attribute handler make sense. Hadn't thought of that. It does mean that the documentation in attributes is an oversimplification when it says:

      my ($x,@y,%z) : Bent = 1;

      is equivalent to

      use attributes (); my ($x,@y,%z); attributes::->import(__PACKAGE__, \$x, 'Bent'); attributes::->import(__PACKAGE__, \@y, 'Bent'); attributes::->import(__PACKAGE__, \%z, 'Bent'); ($x,@y,%z) = 1;

      because things like this DWIM:

      my ($x,@y,%z) : Bent = (@y);

      Right then. With the attribute issue out of the way - I'm still confused why INIT can see everything. Since you can also see everything in CHECK and END blocks, which are run after compilation, I still don't understand what is going on.

        Since you can also see everything in CHECK and END blocks, which are run after compilation, I still don't understand what is going on.

        I think it's because at those stages in the program's lifecycle you don't have a runtime scope, or perhaps it's better said that the entire program is in their scope. They see everything written on a particular pad by the compiler, without perl hiding entries according to scoping rules. (Update: see diotalevi's remark above.)

Re: Lexical pad / attribute confusion
by adrianh (Chancellor) on Dec 22, 2002 at 14:20 UTC

    Thanks to fever, shotgunefx, Aristotle and diotalevi for helping illuminate the areas of my misunderstanding.

    Summary time - and one more question ;-)

    Why can't the implicit call to MODIFY_HASH_ATTRIBUTES in Foo see %foo in perl 5.8? The attribute routine is being called at runtime so the hash should have been declared and be in the pad?

    Because the hash doesn't come into scope until after the assignment, otherwise things like my ($x,@y,%z) : Bent = (@y); won't DWIM (thanks to Aristotle for pointing this out.) Obvious in hindsight.

    <update date="20030104">You can also use Attribute::Handlers::Prospective to get the name of a lexical variable</update>

    Why can INIT see %foo and %bar? They are not in the scope of the INIT subroutine and, since they are in their own blocks, they should not be visible from the file-scoped pad (the last call to dump_lex doesn't show them)?

    Because you don't get a separate pad for { block } scope - only for subroutine and file scope (thanks to fever for pointing out Elian's explanation.)

    INIT (and BEGIN, CHECK & END) are called outside of the normal "runtime" context where everything in the pad is visible.

    This all makes some vague sort of sense - so I now have a handle on what is happening. However, I don't really understand why it's happenning.

    New question: Why are BEGIN et al called in this odd context where they can see everything in the pad?

    Is this just an accident of implementation, or is there a reason?

    I find the fact that:

    use strict; use warnings; use PadWalker qw(peek_my); CHECK { peek_my(1)->{'%foo'}->{answer} = 42 }; { my %foo; print $foo{answer}, "\n"; };

    outputs 42 somewhat counter-intuitive.

      As near as I can gather - it's just an innermost scope. I took the trouble to write a quickie XS library (Devel::DebugScope) to dump some of the scoping variables. You can download it here if you like: http://198.144.10.226/perl/Devel-DebugScope-0.01.tgz. I used it to instrument your code and it corroborates that CHECK and friends have an apparent lexical scope equivalent (or greater for BEGIN) than the inner scope. The key values to look at are scopestack_ix and savestack_ix (I guess).

      use strict; use warnings; use PadWalker qw(peek_my); use Devel::DebugScope qw(dump_scope); CHECK { print "CHECK\n"; Devel::DebugScope::dump_scope(); print "\n\n"; } CHECK { peek_my(1)->{'%foo'}->{answer} = 42 }; print "Runtime outer\n"; Devel::DebugScope::dump_scope(); print "\n\n"; { my %foo; print $foo{answer}, "\n"; print "Runtime inner\n"; Devel::DebugScope::dump_scope(); print "\n\n"; }; __DATA__ CHECK scopestack_ix: 5 savestack_ix: 21 Runtime outer scopestack_ix: 3 savestack_ix: 10 42 Runtime inner scopestack_ix: 5 savestack_ix: 15

      Fun Fun Fun in the Fluffy Chair

        ++. An excellent demonstration of what is happening.

        Now if only somebody can tell me why :-)

        (or point to the relevant docs if there are any).