LanX has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I noticed a fundamental difference between literal scalars and hashes/arrays.

use strict; use warnings; sub pr { print \$_[0],"\t"; # print ref of para } check(qw/ Scalar 1 /); check(qw/ Array [1,2] /); check(qw/ Hash {1,2} /); sub check { my ($_title_,$type) = @_; my $_call_ ="pr $type;"; my $code= <<"__EOC"; for (1..3) { $_call_ print "\\n\\t"; #UPDATE for (1..3) { $_call_ } print "\\n"; } __EOC print "\n--- $_title_ \n"; print $code; eval $code; }

output:

--- Scalar for (1..3) { pr 1; print "\n\t"; for (1..3) { pr 1; } print "\n"; } SCALAR(0x81953b0) SCALAR(0x8195398) SCALAR(0x8195374) SCALAR(0x8195398) SCALAR(0x8195374) SCALAR(0x819538c) SCALAR(0x8195398) SCALAR(0x819538c) SCALAR(0x8195398) SCALAR(0x81953b0) SCALAR(0x819538c) SCALAR(0x81953b0) --- Array for (1..3) { pr [1,2]; print "\n\t"; for (1..3) { pr [1,2]; } print "\n"; } REF(0x819073c) REF(0x8190724) REF(0x8190724) REF(0x8190724) REF(0x819073c) REF(0x8190724) REF(0x8190724) REF(0x8190724) REF(0x819073c) REF(0x8190724) REF(0x8190724) REF(0x8190724) --- Hash for (1..3) { pr {1,2}; print "\n\t"; for (1..3) { pr {1,2}; } print "\n"; } REF(0x8195374) REF(0x8190748) REF(0x8190748) REF(0x8190748) REF(0x81953c8) REF(0x8190748) REF(0x8190748) REF(0x8190748) REF(0x8195374) REF(0x8190748) REF(0x8190748) REF(0x8190748)

As you can see the ref of the arrays and hashes depend on the codeposition while the refs of the scalars are not ...

my motivation is to find a way to distinguish different calls to the same sub by the codeposition.

(please note that caller gives only the line number!)

QUESTION: Is this a defined behaviour, and why does the compiler not just send one stable ref too a scalar?

Cheers Rolf

UPDATE: changed intendation of inner loop and added question!

UPDATE: OK there is no need to discuss this further.

as an outcome it is clear that perl may reallocate memory for data which is obviously constant at compiletime. In other words, the reference to a constant or literal at a special codeposition may always change during runtime and is not fixed at compiletime

Replies are listed 'Best First'.
Re: Reference of constants and literals
by moritz (Cardinal) on Nov 24, 2008 at 10:09 UTC
    I don't quite understand your question. When you have a number, and take a reference to it, it will appear as SCALAR(0x...) in the output. If you have an array reference (or reference to any data structure), and take a reference to it, it will appear as REF(0x...). I don't see any relation to "code position", whatever you mean by that.
    $ # also works for scalars this way, when you take a ref to a ref: $ perl -wle 'print \\4' REF(0x5043b0)

    (And if you use <code>...</code> tags to delimit your code the square brackets will display correctly, and don't turn into links)

      I think it's the symptom of an idea I planted in LanX' head yesterday, the idea of determining that the code is getting called from the same location because it gets passed the same (addresses of) variables. For arrays/hashes, it seems the code actually gets the same addresses of variables even though these are lexicals.

        That might be a nice idea for an obfuscation context, but certainly not for production code. It relies on an undocumented behaviour, here an optimization that might very well be changed in future.

        I think it's related to the optimization described in No garbage collection for my-variables (not sure though).

      Well it's a question of weird optimization, a constant scalar is passed in a certain snippet of code, so no need to switch the reference at runtime.

      This works with explicit refs like [1,2] and {1,2} but not with constants, they needlessly get at runtime a new reference, each time the loop gets there... just compare: check(qw/ Scalarref \1 /);

      OUTPUT

      --- Scalarref for (1..3) { pr \1; print "\n\t"; #UPDATE for (1..3) { pr \1; } print "\n"; } REF(0x8190768) REF(0x8190744) REF(0x8190750) REF(0x8190744) REF(0x8190750) REF(0x8195394) REF(0x8190744) REF(0x8195394) REF(0x8190744) REF(0x8190768) REF(0x8195394) REF(0x8190768)
      each time a new ref instead of one ref

      Cheers Rolf

Re: Reference of constants and literals
by ikegami (Patriarch) on Nov 24, 2008 at 10:37 UTC

    Is this a defined behaviour

    At the very least, you're relying on the memory allocation system allocating the same block twice in a row. That sounds very fragile to me. I can easily see this failing for non-trivial pr or in a multi-threaded application.

    why does the compiler not just send one stable ref too a scalar?

    The memory allocation needs of "creating an array, creating two scalars, assigning the scalars to the array, creating a reference to the array, returning the reference and passing it to a function" (pr [1,2]) are very different than the memory allocation needs of "passing a constant to a function" (pr 1).

Re: Reference of constants and literals
by ikegami (Patriarch) on Nov 24, 2008 at 10:18 UTC

    As you can see the ref of the arrays and hashes depend on the codeposition while the refs of the scalars are not ...

    The only thing you've passed to pr is scalars. You never pass an array or hash, just references to them. It's not even possible to pass arrays or hashes to subroutines. That means you're wrong about having printed refs to arrays (ARRAY(0x...)) and refs to hashes (HASH(0x...)). The only thing you've printed are references to scalars.

    It should read:

    As you can see the ref of the refs depend on the codeposition while the refs of the constants are not ...

    Now, what's your question?

      Hi

      I just made the code and output clearer and added a question.

      > You never pass an array or hash, just references to them. It's not even possible to pass arrays or hashes to subroutines.

      Thats a matter of interpretation, the behaviour of  push @arr, "elem" can be reproduce with prototypes sub name (\@@)

      Cheers Rolf

        First of all, push is not a subroutine. I'll concentrate on your second example, sub name (\@@).

        While it could be a matter of interpretation in general, it's unequivocal in this case because we're talking about the value of $_[0]. The \@ prototype causes a reference to the array to be passed to the sub, not an array. $_[0] contains a reference, not an array. Printing \$_[0] would print a reference to a reference to an array, not a reference to an array.

        Thats a matter of interpretation, the behaviour of push @arr, "elem" can be reproduce with prototypes sub name (\@@)

        The prototype is just syntactic sugar for taking a reference (plus extra behaviour, for example enforcing list context), so independently of what it looks like on the caller side, the callee always sees a reference, never the array itself.

      Now, what's your question?

      Well, why the heck do constants get a new allocated place? Doesn't seem to me as if the compiler does optimisation right!

        ?? You are not supposed to care one way or the other.