in reply to Re: Difference between exists and defined
in thread Difference between exists and defined

Update: The ideas I've expressed in this post are apparently neither entirely correct nor entirely incorrect! Please see the posts of LanX here, haukex here and dsheroh here.

$array[3] is the only element you've assigned a value to ... so there's no reason for Perl to have allocated storage space for any other elements. ... Perl arrays are sparse data structures ... you can assign to $array[8675309] without consuming ... memory to store ... unused elements ...

I think these statements are incorrect regarding Perl positional (if that's the correct term) arrays. (Perl associative arrays are sparse.) Using Windows Task Manager to graph memory usage in real time (Windoze gotta be good for something) when the following code is executed, one can see that assignment to an array element causes contiguous allocation of enough memory to "grow" the array sufficiently to include the assigned element.

c:\@Work\Perl\monks>perl -wMstrict -le "my @ra; print 'array declared'; sleep 5; ;; $ra[ 100_000_000 ] = 42; print '1st array assignment'; sleep 5; ;; $ra[ 200_000_000 ] = 137; print '2nd array assignment'; sleep 5; ;; print 'byebye'; " array declared 1st array assignment 2nd array assignment byebye
The same effect is seen with assignment to array length rather than to any element:
    $#ra = 100_000_000;

It's a question of what to do with the allocated memory. Perl arrays are arrays of scalars, and a scalar is constructed by default in the very well-defined state of un-defined-ness; an "undefined" scalar is a completely specified C/C++ object. So how do you initialize the space for 100,000,000 scalars allocated in the example above? The specific way this question is answered from one CPU/OS/Perl implementation to another is the basis of the ambiguity surrounding the use of exists on allocated but never-accessed array elements.

My fuzzy understanding of the Perl guts is that to save time (not space!), array elements in the situation described above are quickly created in a state of quasi-existence: the memory is not left as random garbage, but neither is it a sequence of fully-fledged, default-initialized scalars. Hence the advice regarding use of exists with array elements: Don't Do That!™

Perhaps others more familiar with the details of this question can comment on specifics.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^3: Difference between exists and defined
by LanX (Saint) on Apr 17, 2019 at 23:35 UTC
    I don't remember where I read it (probably in the Panther book) but Perl arrays are designed to easily compete with linked lists, while keeping the benefits of indexed access.

    That is to allow dynamic growth on both ends in a very dynamic way.

    An array has an internal index for the first and last element and allocates twice as much space as reserve for push or unshift.

    Basically only the range between the first and last existing element need to be stored, plus mentioned reserve.

    The existing elements are kind of pointers to scalars which are allocated separately.

    Allocation of new space is only needed if the reserve elements are filled, since this happens in exponential steps of doubling* it's statistically very efficient.

    Shrinking the array happens just by adjusting the indices for the first and last element.

    HTH

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    see here Shift, Pop, Unshift and Push with Impunity!

    *) not sure anymore about the doubling, maybe confusing that part with hashes.

Re^3: Difference between exists and defined
by haukex (Archbishop) on Apr 18, 2019 at 07:23 UTC
    I think these statements are incorrect regarding Perl positional (if that's the correct term) arrays. (Perl associative arrays are sparse.)

    Just wanted to confirm that you're correct that Perl's arrays are not sparse. I haven't yet found a reference in the official docs that says so explicitly, but I'm sure it's somewhere.

    use Devel::Size 'total_size'; my @foo; print total_size(\@foo), "\n"; # prints 64 $foo[100_000_000] = 'x'; print total_size(\@foo), "\n"; # prints 800000114 $foo[200_000_000] = 'x'; print total_size(\@foo), "\n"; # prints 1760000156
Re^3: Difference between exists and defined
by dsheroh (Monsignor) on Apr 18, 2019 at 08:06 UTC
    I stand corrected. At some point, I probably read the linked list thing that LanX mentioned and now misremembered it.

    To convince myself, I threw together:

    #!/usr/bin/env perl use strict; use warnings; use 5.010; use Memory::Usage; my @array1; my @array2; my $mu = Memory::Usage->new(); $mu->record('ready to go'); $array1[5268] = 1; $mu->record('array1 has an element'); $array2[8675309] = 1; $mu->record('array2 has an element'); $mu->dump();
    Running this on a Debian 8.11 machine with perl 5.20.2, I get the result:
    time vsz ( diff) rss ( diff) shared ( diff) code ( diff) + data ( diff) 0 20824 ( 20824) 2568 ( 2568) 1916 ( 1916) 8 ( 8) + 920 ( 920) ready to go 0 20824 ( 0) 2568 ( 0) 1916 ( 0) 8 ( 0) + 920 ( 0) array1 has an element 0 88600 ( 67776) 70416 ( 67848) 2048 ( 132) 8 ( 0) + 68696 ( 67776) array2 has an element
    The array index 5268 that I used for array1 is a magic number, apparently corresponding to the minimum size that my perl allocates for an array when it's initially declared. If I increase the index to 5269, it shows an additional 132k (all the numbers are in kilobytes) allocated when array1 is assigned to.