Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

When running on a Sun (Unix) workstation, the following code takes about 500MB of RAM!
foreach $x (0..4000) { foreach $y (0..4000) { $matrix[$x][$y] = 1.0; } }
This seems like an insanely large amount. :)

Here's why I think that ... A 4001x4001 array has approximately 16,000,000 elements. If each element is a float (which is the case here) then if one assumes a 4 byte float (which is the case on my machine), you would need 64,000,000 bytes to hold them in contiguous memory. That's 64MB (right? I'm not making some silly math mistake am I??). So even if somehow each float takes 8 bytes, that's still just 128MB.

Perl uses 500MB for this. Now, I can accept the idea that there is some overhead involved, but not hundreds of MB.

Why does this take up so much memory? Equally important, how can I do this and use less memory?

Thanks!

Replies are listed 'Best First'.
Re: why does this array take up so much memory?
by Corion (Patriarch) on Jan 26, 2003 at 21:57 UTC

    If you look at it from the other direction, each value stored in the array only takes up about 32 bytes (even less, because the array itself has some overhead as well). I don't know much about perls internal memory structure, but every scalar has at least three slots, SV, PV and NV, for the string value, float value and integer value of said scalar. That's already 12 bytes taken up, plus another 4 bytes for the pointer from the array to that scalar (now at 16 bytes). The rest could be other overhead or simply more overhead from the scalars, which I didn't mention/know.

    A matrix this huge might be better handled via C code as ints respective doubles, or a database, depending on how large your data will actually be.

    If you really need a matrix this large, you should also consider what operations you want to perform on it, and whether a sequential storage in rows makes more sense - or whether floats hold enough precision to allow more complex operations such as multiplications/divisions without too much loss of precision...

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: why does this array take up so much memory?
by adrianh (Chancellor) on Jan 26, 2003 at 21:59 UTC

    Perl is not the most memory-efficient of languages. Quite the opposite. Many design decisions were made that trade memory for speed. For example, arrays are padded so you don't have to reallocate memory when they grow.

    Playing around with Devel::Size can be informative.

    You might find the PDL modules of use - they're good at efficient manipulation of N-dimensional arrays.

      Devel::Size is very informative:
      use strict; use Devel::Size qw(size total_size); my $double; my @array; my @matrix; $double = 1.0; for my $x (0..9) { $array[$x] = 1.0; for my $y (0..9) { $matrix[$x][$y] = 1.0; } } print "Double: ", size(\$double), ", ", total_size(\$double), "\n"; print "Array: ", size(\@array), ", ", total_size(\@array), "\n"; print "Matrix: ", size(\@matrix), ", ", total_size(\@matrix), "\n";
      gives these results running on my Alpha with Tru64:
      Double: 24, 24 Array: 168, 408 Matrix: 168, 4488
      So a matrix of 10*10 elements cost 4.5 K, giving 45 bytes pr element - not too bad.

      Try other sizes for yourself, and go either the PDL or the Inline::C way, as others have recommended.

Re: why does this array take up so much memory?
by tall_man (Parson) on Jan 27, 2003 at 01:02 UTC
    No one has mentioned yet that your two-dimensional array is really an array of 4001 references to one-dimensional arrays of size 4001 each. You could save space with a flattened representation like this:
    $matrix[4000 + 4000*4000] = 1.0; foreach $x (0..4000) { foreach $y (0..4000) { $matrix[$x + 4000*$y] = 1.0; } }
    The other thing I did was to pre-set the highest element first. I think you should do that anyway, even if you stick to the two-dimensional style. That prevents perl from doing lots of re-allocations behind the scenes, and pumping up the padding each time.
Re: why does this array take up so much memory?
by BrowserUk (Patriarch) on Jan 27, 2003 at 08:13 UTC

    Equally important, how can I do this and use less memory?

    As always (it seems to me), Perl provides the tools to do the job.

    Heres a crude, but working implementation of a float array class that allows me to allocate your 16,000,000 float array using 125,278k of ram. That's an overhead of just 0.01764 bytes per float (276k total).

    It's not to shabby on the performance stakes either coming at 0.000238 seconds per write/read access, and allocation of the 16 million taking under 4.5 seconds. That includes initialising them (to the same value. 0.0 by default).

    Of course, what you loose is flexibility. It will only store floats (doubles are an easy mod) and you can't shrink or grow the array.

    #! perl -slw use strict; package SEIFA; #! Space Efficient, Inflexible Float Array my %instances; use constant ARYREF => 0; use constant X_SIZE => 1; use constant Y_SIZE => 2; sub new { my ($class, $x, $y, $init ) = @_; my $size = $x * $y; my $mem = pack('f', $init||'0.0') x $size; my $data = [ \$mem, $x, $y ]; my $self = bless $data, $class; $instances{$self} = $data; return $self; } use constant X => 1; use constant Y => 2; sub seifa{ my ($data, $x, $y) = ($instances{+shift}, shift, shift); my $pos = ($data->[X_SIZE] * $y * 4) + ($x * 4); return unpack 'f*', substr( ${$data->[ARYREF]}, $pos, 4) unless @_ +; return unpack 'f*', substr( ${$data->[ARYREF]}, $pos, @_ * 4) = pa +ck 'f*', @_; } package main; use vars qw[$X $Y]; use Benchmark::Timer; my $timer = Benchmark::Timer->new(); $X ||= 20; $Y ||= 20; print 'Before'; <>; $timer->start("allocate ${\($X * $Y)}"); my $matrix = SEIFA->new( $X, $Y, 1.0); $timer->stop("allocate ${\($X * $Y)}"); print 'After'; <>; $matrix->seifa( $X-1, 0, 3.1415926 ); $matrix->seifa( $X-1, $Y-1, 3.1415926 ); local $\; for my $y (0 .. 5, '...', $Y-6 .. $Y-1) { print( " ... " x 8, $/), next if $y eq '...'; for my $x (0 .. 3, '...', $X-3 .. $X-1) { print( ' ... '), next if $x eq '...'; $timer->start('write-read'); my $value = $matrix->seifa($x, $y, "$y.$x"); $timer->stop('write-read'); printf '%9.4f, ', $value; } print $/, } print $timer->report(); __END__ c:\test>230052 -X=4000 -Y=4000 Before After 0.0000, 0.1000, 0.2000, 0.3000, ... 0.3997, 0.3 +998, 0.3999, 1.0000, 1.1000, 1.2000, 1.3000, ... 1.3997, 1.3 +998, 1.3999, 2.0000, 2.1000, 2.2000, 2.3000, ... 2.3997, 2.3 +998, 2.3999, 3.0000, 3.1000, 3.2000, 3.3000, ... 3.3997, 3.3 +998, 3.3999, 4.0000, 4.1000, 4.2000, 4.3000, ... 4.3997, 4.3 +998, 4.3999, 5.0000, 5.1000, 5.2000, 5.3000, ... 5.3997, 5.3 +998, 5.3999, ... ... ... ... ... ... +... ... 3994.0000, 3994.1001, 3994.2000, 3994.3000, ... 3994.3997, 3994.3 +999, 3994.3999, 3995.0000, 3995.1001, 3995.2000, 3995.3000, ... 3995.3997, 3995.3 +999, 3995.3999, 3996.0000, 3996.1001, 3996.2000, 3996.3000, ... 3996.3997, 3996.3 +999, 3996.3999, 3997.0000, 3997.1001, 3997.2000, 3997.3000, ... 3997.3997, 3997.3 +999, 3997.3999, 3998.0000, 3998.1001, 3998.2000, 3998.3000, ... 3998.3997, 3998.3 +999, 3998.3999, 3999.0000, 3999.1001, 3999.2000, 3999.3000, ... 3999.3997, 3999.3 +999, 3999.3999, 1 trial of allocate 16000000 (4.467s total) 84 trials of write-read (20ms total), 238us/trial c:\test>

    Of course, you could get fancy and use tie and/or overload--shame you can't overload [..] (can you?)--but the performance hit may not be worth hit.

    I also played with a version that used a standard array for the first dimension and pack'd strings of doubles for the second. gives back some of the flexibility in that you can shrink and grow the main array and the sub-arrays (independantly) including autovivfying on assignment, but the penalty is x16 memory consumption and 1/4 of the speed.


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      You can overload array deref. So you could write a class that overloads @{} to return an object of another class also overloading the operator, which finally returns an array constaining the desired element(s), enabling syntax like
      my $element = $matrix->[$y]->[$x];
      I'll come back later today to add some proof of concept code. Of course you do take a speed hit and the resize restricition doesn't go away, but this would seem to be a case of trading speed for memory anyway.

      Makeshifts last the longest.

Re: why does this array take up so much memory?
by Courage (Parson) on Jan 26, 2003 at 22:46 UTC
    There are several ways to force Perl use less memory.
    1. using Inline::C create C's subroutine that implements your two-dimensional array
    2. use one single scalar value and simulate your array by taking proper substr
    3. If your array is really all 1-s with few exceptions, just store such exceptions in another array and write a simple sub that simulates your 2D array.
    May be some more.

    Courage, the Cowardly Dog

Re: why does this array take up so much memory?
by Elian (Parson) on Jan 27, 2003 at 00:37 UTC
    Variables in perl take up at least pointer and two native ints (for just an undef) plus at least the size of the data being stored for ints and double-precision floats. So in this case you've got a pointer, two ints, and a double, which will snag 20 bytes per array element, not counting the array overhead itself. So you've 16M elements, times 20, plus array overhead.

    Perl, as has been pointed out, is not stingy with memory. There are ways around it, but none used yet. (Perl 6 has provisions to be much stingier, but it's not done yet)

    This seems to be a common thing. I may well throw together a Stingy::Array module or something to get better packing.

      I may well throw together a Stingy::Array module or something to get better packing
      I, for one, would be very grateful for such a module.

      rdfield

        I went looking after I posted, and there is Tie::IntegerArray, but it's only a 0.1 version and works through Bit::Vector, so I'm not sure how fast it is. I'm going to poke at it in a day or so, and if it's not reasonably snappy I will put a module together myself. (Might anyway, as something to do a Parrot benchmark against, but that's a separate issue... :)
Re: why does this array take up so much memory?
by FamousLongAgo (Friar) on Jan 27, 2003 at 00:44 UTC

    If you are doing matrix operations in Perl (which your code and variable name suggest), consider using the PDL module (it stands for Perl Data Lanuguage). This is a compiled C matrix library highly optimized for speed, and the data structures take up much less memory than native Perl arrays.

    PDL is very full-featured, and overloads many common operators, so you can do matrix operations like addition and multiplication in a very DWIMmish way. The only downside is having to read through a small mountain of documentation.

Re: why does this array take up so much memory?
by bronto (Priest) on Jan 27, 2003 at 09:40 UTC

    Uhm... Which version of Perl are you using?

    I know almost nothing (or less) about Perl internals, but I remember that in old releases the construct (0..4000) wasn't optimized and effectively created a 4001-element list. Could it be your case?

    Ciao!
    --bronto

    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->make($love) ;
    }

Re: why does this array take up so much memory?
by bart (Canon) on Jan 28, 2003 at 12:07 UTC
    My rule of thumb is that each scalar takes roughly 30 bytes, minimum. This seems to agree well with your observations.

    In case you were to try storing integers in a 2D array, may I tell you to look into vec(), as a way to store a one-dimensional array of integers in one scalar? The memory overhead would then become pretty much as you calculated, wasting only 30 bytes per row, while you still have handy single cell access. In case you're new to vec(): it is very much like what substr() is for character strings, except that now it's for arrays of integers, and you only have access to one individual cell at a time. You do have to specify the cell size with every vec() call.

    Just to make it perfectly obvious: for a 2D array, you make a 1D (perl) array of such vec scalars.

    But, since you really seem to be into floating point numbers, I won't. Take the other people's advice, and first of all, look into PDL, the number crunching suite for Perl.

    Well... you could still use pack()/unpack() to store a 1D array of floating point numbers in a string, but it doesn't look very handy to me. You might as well write the stuff in C. ;-)

Re: why does this array take up so much memory?
by Elian (Parson) on Feb 07, 2003 at 02:14 UTC
    As a really late followup, there's now Packed::Array if you want to have a dense, integer-only array. Takes up (size * (sizeof(IV))) + 20 bytes for an integer array, give or take a native word or two.