LanX has asked for the wisdom of the Perl Monks concerning the following question:

That's a follow up to Refresh a Module

Consider the following code

~ $ perl -w -e'sub g{1}; my $cr=\&g; eval q(sub g{2}); print g(); pr +int $cr->();' Subroutine g redefined at (eval 1) line 1. 21~ $
As you can see is the sub g dynamically redefined and the calls work.

Looking in the optree reveals that the compiler did some optimisation when compiling the code, and stored the reference \&main::g inside the call. (See OP "e" )

~ $ perl -MO=Concise,-exec -w -e'sub g{1}; my $cr=\&g; eval q(sub g{2 +}); print g(); print $cr->();' 1 <0> enter v 2 <;> nextstate(main 3 -e:1) v:{ 3 <#> gv[IV \&main::g] s 4 <1> rv2cv[t3] lKRM/AMPER,TARG 5 <1> srefgen sK/1 6 <0> padsv[$cr:3,4] sRM*/LVINTRO 7 <2> sassign vKS/2 8 <;> nextstate(main 4 -e:1) v:{ 9 <$> const[PV "sub g{2}"] s a <1> entereval[t256] vK/1 b <;> nextstate(main 4 -e:1) v:{ c <0> pushmark s d <0> pushmark s e <#> gv[IV \&main::g] s f <1> entersub lKS g <@> print vK h <;> nextstate(main 4 -e:1) v:{ i <0> pushmark s j <0> pushmark s k <0> padsv[$cr:3,4] s l <1> entersub[t4] lKS/TARG m <@> print vK n <@> leave[1 ref] vKP/REFC -e syntax OK

This reference must be obviously going to the first g(), because the second wasn't known at compile time. But calling the first g() can't be right.

Now, how was Perl able to fix this optimisation?

I did a Devel::Peek of both subs, and couldn't see any data hinting to "forwarding".

Looking up the Symbol table wouldn't make sense, because it's making the optimisation useless

Am I misreading the op-tree and there is no optimisation?

I found an SO thread where Tom Christiansen explicitly states this optimisation is happening.

I'm curious, how is it implemented? What am I missing?

For completeness, here the output from Devel::Peek

~ $ perl -w -e'sub g{1}; my $cr=\&g; eval q(sub g{2}); print g(); pri +nt $cr->();use Devel::Peek; Dump($cr);Dump(\&g);' Subroutine g redefined at (eval 2) line 1. SV = IV(0xb40000715c6827e8) at 0xb40000715c6827f8 REFCNT = 1 FLAGS = (ROK) RV = 0xb40000715c682810 SV = PVCV(0xb40000715c6812d8) at 0xb40000715c682810 REFCNT = 1 FLAGS = (DYNFILE) COMP_STASH = 0xb40000715c60c6c0 "main" START = 0xb40000715c699698 ===> 1 ROOT = 0xb40000715c699620 GVGV::GV = 0xb40000715c682858 "main" :: "g" FILE = "-e" DEPTH = 0 FLAGS = 0x1000 OUTSIDE_SEQ = 1 PADLIST = 0xb40000715c63b480 PADNAME = 0xb40000715c693bd0(0xb40000715c605510) PAD = 0xb40000715 +c682828(0xb40000715c63b4a0) OUTSIDE = 0xb40000715c60c9d8 (MAIN) SV = IV(0xb40000715c682038) at 0xb40000715c682048 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0xb40000715c682078 SV = PVCV(0xb40000715c6e2548) at 0xb40000715c682078 REFCNT = 2 FLAGS = (DYNFILE) COMP_STASH = 0xb40000715c60c6c0 "main" START = 0xb40000715c699b98 ===> 2 ROOT = 0xb40000715c699b20 GVGV::GV = 0xb40000715c682858 "main" :: "g" FILE = "(eval 2)" DEPTH = 0 FLAGS = 0x1000 OUTSIDE_SEQ = 213 PADLIST = 0xb40000715c63b4e0 PADNAME = 0xb40000715c693c90(0xb40000715c605670) PAD = 0xb40000715 +c682120(0xb40000715c720260) OUTSIDE = 0xb40000715c60c6a8 (UNIQUE) 21~ $

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

Update

FWIW: The redefine mechanism is related to the symbol table, because if I delete the symbol the full redefine partly fails and the old g() is called when the coderef was stored.

~ $ perl -w -e'sub g{1}; my $cr=\&g; delete $::{g}; eval q(sub g{2}); + print g(); print $cr->();' 11~ $

Which makes sense, because a symbol can't be redefined if it doesn't exist.

Update

I just discovered that a subroutine which isn't redefined carries a flag NAMED , like

FLAGS = (DYNFILE,NAMED)

so probably this is checked, and if the flag is missing, the symbol table entry becomes the fall back.

Replies are listed 'Best First'.
Re: How is redefining a sub internally done?
by ikegami (Patriarch) on Sep 27, 2024 at 14:05 UTC

    It says "gv[IV \&main::g] s"

    "gv" means a glob. "\&" means a code ref. Odd. Let's look at pp_gv.

    According to a comment and an assert in pp_gv, the attached SV "might be a real GV or might be an RV to a CV". So it could be *main::g or \&main::g. Does B::Concise distinguish between these, or does it always print \&main::g?

    That's as far as I got.

      Normally, sub calls are compiled into a typeglob retrieval followed by a retrieval of the CV slot within that typeglob. The GV OP has a pointer to the GV associated with the name hard-baked in at compile time. The OP_GV pushes the GV on the stack, then the OP_ENTERSUB pops thes the GV off the stack, accesses its CV slot, calls the associated CV.

      However as an optimisation, GVs which only have their CV slot used, are instead created as an RV to a CV. So for example, for

      package FOO; sub f { ... } f()
      at compile time, the value of the hash entry $FOO::{f} is created as an RV to the CV associated with f, rather than as a full typeglob. When the GV op is compiled, it points to that RV. When the GV op is called, it pushes that RV onto the stack. When the ENTERSUB is is called, it pops that value, notices that it's an RV rather than a GV, and extracts the CV as the thing referenced.

      When things get more complex, the 'RV to a CV' SV is upgraded to a full GV with the CV in its code slot.

      An 'RV to CV' is smaller and quicker than a full GV (a GV points to a GP which has a CV slot - so two allocations, two dereferences).

      Dave.

      > Does B::Concise distinguish between these

      Yes, for instance if a sub is not predefined in the STASH you'll see the *main::g or rather *g form.

      Also in (for me) random cases where it's predefined.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      Update
      ~/perl $ perl -MO=Concise,-exec -e'g();' 1 <0> enter v 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gv[*g] s/EARLYCV 5 <1> entersub[t2] vKS/TARG 6 <@> leave[1 ref] vKP/REFC -e syntax OK ~/perl $

      s/EARLYCV means the sub was unknown at compile time.

        Inherited subs are cached into the namespace that inherits them.

        # Causes something akin to # `*Foo::method = \&Base::method;` $foo->method();

        But it uses a counter system to invalidate the cache. The package's counter is incremented when the package is changed, making it so the counter in the cached entry no longer matches the package's, invalidating the cached entry.

        Perhaps that same mechanism is used here.

Re: How is redefining a sub internally done?
by Danny (Chaplain) on Sep 26, 2024 at 23:15 UTC
      Which makes sense, because a symbol can't be redefined if it doesn't exist.

    But if you throw in an undef &g; before the delete, it prints 22. Why is that?

      I can only speculate, like in my last update, that as soon as a sub has "weird" flags, the symbol table acts as fallback.

      But I couldn't find any informations in the docs, we need to wait for someone knowing the internals.

      Obviously is deleting a symbol, like already demonstrated, an efficient way to sabotage the whole redefine mechanism.

      Update

      Well, I should have added that undef &g is changing the sub considerably. Use Devel::Peek with a coderef to see by yourself.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery