ig has asked for the wisdom of the Perl Monks concerning the following question:

I have the following test program:

use strict; use warnings; use Devel::Peek; my $y; Dump($y); for my $x (undef, 10) { print "\$x = $x\n"; Dump($x); Dump($_) for ( ($x, $y, undef) = 1..10), $x, $y; }

Which produces the following output:

SV = NULL(0x0) at 0x8ad23d0 REFCNT = 1 FLAGS = (PADMY) Use of uninitialized value $x in concatenation (.) or string at ./test +.pl line 10. $x = SV = NULL(0x0) at 0x8ab54fc REFCNT = 2147483499 FLAGS = (READONLY) SV = IV(0x8ab987c) at 0x8ab9880 REFCNT = 2 FLAGS = (IOK,pIOK) IV = 1 SV = IV(0x8ad23cc) at 0x8ad23d0 REFCNT = 2 FLAGS = (PADMY,IOK,pIOK) IV = 2 SV = IV(0x8adf414) at 0x8adf418 REFCNT = 2 FLAGS = (IOK,pIOK) IV = 3 SV = NULL(0x0) at 0x8ab54fc REFCNT = 2147483500 FLAGS = (READONLY) SV = IV(0x8ad23cc) at 0x8ad23d0 REFCNT = 2 FLAGS = (PADMY,IOK,pIOK) IV = 2 $x = 10 SV = PVIV(0x8ab86ac) at 0x8ad2410 REFCNT = 2 FLAGS = (PADTMP,IOK,POK,READONLY,pIOK,pPOK) IV = 10 PV = 0x8af56a0 "10"\0 CUR = 2 LEN = 4 Modification of a read-only value attempted at ./test.pl line 12.

It appears that both $x and $y are lexical (my) variables, and yet there are significant differences between their values and how they behave.

In the first iteration of the loop, both $x and $y are initially undefined, as is the return from undef in the list which is LHS of the list assignment. The list assignment is in list context and returns the list of lvalues assigned to (Assignment Operators). In the case of $x, the lvalue returned is not $x - a new SV is created and returned, set to the assigned value, and $x is not modified. In the case of $y, the lvalue returned is $y itself, set to the assigned value. In the case of undef, as with $x, a new SV is created and returned, set to the assigned value. Thus $x behaves or is handled like undef rather than like another my variable which also has the value undef.

I am curious about the SVs created for $x and undef. They are neither PADMY nor PADTMP, yet I doubt they are globals. I wonder where they exist (where are the 2 references to them) and what their scopes are. update: I suppose $_ is one of the references, but where is the other?

In the first iteration of the loop, $x is readonly. Assignment to it does not alter it but does not cause a runtime error. Instead, a new SV is created to take the assigned value.

In the second iteration of the loop, $x is again readonly, but this time assignment to it causes a runtime error instead of causing a new SV to be created and assigned to.

I know that in the for loop the loop variable ($x in this case) is aliased to the elements of the list being iterated over. Thus the variable is not an ordinary variable. update: or perhaps I should say that the value is not an ordinary value.

Obviously, undef is handled specially in some cases, but not in all cases (e.g. $y in the first iteration of the loop).

I would appreciate any comments, explanations or pointers to the documentation of source code that might help me understand these behaviors.

Replies are listed 'Best First'.
Re: list assignment and undef (deduction)
by ikegami (Patriarch) on Aug 25, 2009 at 17:08 UTC

    You called $x a lexical, but that's only semi-true. $x is aliased to undef, so $x is whatever undef returns.

    When a LHS element of a list assignment is undef*, Perl apparently creates an SV to contain the assigned value. These are returned by the list assignment, and these are the values you are dumping for the first and third pass of the inner loop.

    There doesn't seem to be anything special about the loop wrt undefined values. It seems to be the list assignment creating the new SVs.

    * — Probably not any undefined value, just those that look like they were returned by undef.

      So I looked at Perl's source code a bit, and I've found out what really happens. I was right that for had nothing to do with it, but I wasn't entirely correct with respect to the assignment operator.


      You called $x a lexical, but that's only semi-true. $x is aliased to undef, so $x is whatever undef returns.

      The important bit you're missing follows: When a LHS element of a list assignment is immortal*, Perl pretends the RHS element is on the LHS for the purpose of building the return value.**

      $ perl -wle'$x="xx";$y="yy"; $_=uc for ($y)=$x; print "$x$y"' xxXX $ perl -wle'$x="xx";$y="yy"; $_=uc for (undef)=$x; print "$x$y"' XXyy

      This means the first and third element of the list returned by the range operator (..) is being returned by the assignment operator, and these are the values you are dumping for the first and third pass of the inner loop.

      There's nothing special about the loop wrt undefined values. The new SVs are created by the range operator (..).

      * — One of PL_sv_undef, PL_sv_yes, PL_sv_no and PL_sv_placeholder.

      ** — This isn't documented. This was gleaned from the Perl source code.

        Very interesting. Where in the source were you looking? I haven't got much past the tokeniser yet, but was initially motivated to get a better understanding of exactly these sorts of behaviors. I may get on to the parser soon, and then to the op tree and execution... But a jump ahead might be an interesting / helpful diversion.

        It seems that different cases are handled quite differently, making it hard to provide any simple explanation of what is happening. Perhaps for about the same reason that "only perl can parse Perl", only the full source can explain what perl does.

        use strict; use warnings; use Devel::Peek; print "First\n"; Dump($_) for (1..2); print "Second\n"; Dump($_++) for (1..2); print "Third\n"; Dump($_++) for (1..2); print "Fourth\n"; Dump($_) for (1..2); print "Fifth\n"; Dump($_) for ( (undef, undef) = (1..2) );

        produces

        First SV = IV(0x86b698c) at 0x86b6990 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 SV = IV(0x86b698c) at 0x86b6990 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 2 Second SV = IV(0x86cf3bc) at 0x86cf3c0 REFCNT = 1 FLAGS = (PADTMP,IOK,pIOK) IV = 1 SV = IV(0x86cf3bc) at 0x86cf3c0 REFCNT = 1 FLAGS = (PADTMP,IOK,pIOK) IV = 2 Third SV = IV(0x86dc45c) at 0x86dc460 REFCNT = 1 FLAGS = (PADTMP,IOK,pIOK) IV = 1 SV = IV(0x86dc45c) at 0x86dc460 REFCNT = 1 FLAGS = (PADTMP,IOK,pIOK) IV = 2 Fourth SV = IV(0x86b698c) at 0x86b6990 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 SV = IV(0x86b698c) at 0x86b6990 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 2 Fifth SV = IV(0x86b687c) at 0x86b6880 REFCNT = 2 FLAGS = (IOK,pIOK) IV = 1 SV = IV(0x86b698c) at 0x86b6990 REFCNT = 2 FLAGS = (IOK,pIOK) IV = 2

        The first case seems inconsistent with the description in Statement Modifiers:

        The "foreach" modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn).

        In the first case, a single variable (SV at the same address and with an IV at the same address) has different values assigned to the IV in each iteration. This appears to be a single variable modified at run time rather than aliases to the distinct elements of a list created at compile time.

        In the second case, despite that the only difference from the first case is the post-increment on $_, the variables are quite different: this time each SV is at a different address (different from each other and different from the single address in the first case), but the associated IVs are, in each iteration, at the same address, though this is a different address from the first case. Furthermore, they now have PADTMP set. The fact that the IV is at the same address in each iteration suggests that these are still not simple aliases to distinct elements of a list - the IV is being modified at run time.

        The third case clarifies that $_ isn't simple aliases to the list elements: Again each iteration has an SV at a different address - different from each other and different from those in the second case. There is, again, one IV for all iterations but its address is not the same as in the second case. Furthermore, the same values appear in the IVs as in the second case. Thus, either $_ isn't an alias to the elements of the list or a separate list is produced for each instance of (1..2) as the post-increment in the second case doesn't affect the values seen in the third case. This is quite different from the results in your previous example.

        This defies any simple explanation of cached lists generated at compile time and $_ being aliased to elements of such lists.

        The fourth case produces the exact same result as the first case. Thus there are not simply new SVs and IVs generated each time. There is some reuse.

        The fifth case is back to list assignment. This case is different again: in each iteration $_ is an SV with a different address (different in each iteration and different from all previous cases) and each SV has an IV at a different address (again, different in each iteration and different from all previous cases). Thus, it seems that when the LHS of a list assignment in list context is immortal the SV (is it an lvalue?) produced is not simply that from the RHS.

        By the way, I am looking at this in an effort to improve the documentation of the assignment operator (http://rt.perl.org/rt3/Public/Bug/Display.html?id=68312). In the beginning I simply wanted to add definition of what "list assignment" and "scalar assignment" are as the terms were already being used but without definition. It seems I have jumped into a barrel of worms - as with the cat in the box, it is indeterminate whether this is more or less fun than a barrel of monkeys, or whether we are all dead or alive.

Re: list assignment and undef (range returns non-temps)
by ikegami (Patriarch) on Aug 25, 2009 at 18:30 UTC

    I left one of your questions unanswered: Why do the values returned by 1..10 appear to be global?

    $ perl -MDevel::Peek -e'Dump $_ for 1' SV = IV(0x816a408) at 0x814f6f0 REFCNT = 2 FLAGS = (PADBUSY,PADTMP,IOK,READONLY,pIOK) IV = 1 $ perl -MDevel::Peek -e'Dump $_ for 1..1' SV = IV(0x816a40c) at 0x814ed9c REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1

    I've touched on it before. Range operators in list context with constants for arguments are expanded once (at compile-time?) and cached.

    I wish I could explain the following:

    $ perl -le'for (1..2) { map { print($_++) } 1..2 }' 1 2 2 3 $ perl -le'for (1..2) { for (1..2) { print($_++) } }' 1 2 1 2
      I did some testing to see what was going on in this last thing. The map{} version produces different results because somehow it appears that it is modifying the map{} input!

      I don't understand WHY? either, I didn't think that map{} could modify an input, but evidently that is true when the $_ var is modified. This modified input apparently is used on subsequent use of the map{}.

      My testing raises more questions than it provides answers, but one thing is that ("a","b","c","d") is different than ('a'..'d').

      I changed to a,b,c for input instead of 1,2,3 during testing to differentiate the "for (1..2)" from the map's 1,2.

      Anyway I hope looking at this test will stimulate other Monk brain cells!

      #!/usr/bin/perl -w use strict; $| =1; #autoflush on (sequence STDERR STDOUT timewise) foreach my $z (1..3) { my @a = map{ $_++} ('a'..'d'); print "z=$z variable a=@a\n"; } #prints #z=1 variable a=a b c d #z=2 variable a=b c d e #z=3 variable a=c d e f foreach my $z (1..3) { #for some reason this modifies the input to the map # my @a = map{ $_++} ("a","b","c","d"); #this works differently....ie, like the 2nd case in post! # my @a = map{my $x = $_; # $x++;} ("a","b","c","d"); print "z=$z variable a=@a\n"; } #prints #Modification of a read-only value attempted at #C:\TEMP\perl11.pl line 18.

        I didn't think that map{} could modify an input, but evidently that is true when the $_ var is modified.

        It's clearly documented: "Note that $_ is an alias to the list value, so it can be used to modify the elements of the LIST." Just like foreach loops.

        Many people use map { s/// }, but that clobbers the input in the process. List::MoreUtils's apply and Algorithm::Loops's Filter solve this problem, and so does

        s/// for my @out = @in;

        but one thing is that ("a","b","c","d") is different than ('a'..'d').

        Indeed. A constant range in list context is flattened into an array.

        $ perl -MO=Concise,-exec -e'print 1..3' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <0> pushmark s 4 <$> const[AV ] s <----- 5 <1> rv2av lKP/1 6 <@> print vK 7 <@> leave[1 ref] vKP/REFC -e syntax OK

        As previously mentioned, the problem is that the members of the array aren't read-only.