sunmaz has asked for the wisdom of the Perl Monks concerning the following question:

I have an array in my program that stores a large amount of data.The Devel::Size qw(total_size) package says the array's size is 15.4 GB. However, when I traverse the array in a foreach loop, writing each index to a file, adding in line breaks, the size of the file is only 2.7 GB. I do not believe this to be an issue with the Devel::Size qw(total_size) package, as I run out of memory when running my program, allocating well over 4 GB to it. How is this possible and how can it be resolved?

Thanks in advance.

Replies are listed 'Best First'.
Re: Bizarre Array Size Disparity
by BrowserUk (Patriarch) on May 17, 2012 at 17:19 UTC

    5.7 times memory to disk ratio is not unusual.

    Eg: A file that takes 12MB on disk, requires 72MB in ram:

    C:\test>perl -le"printf qq[%010u\n], $_ for 1 .. 1e6" >1e6x10.txt C:\test>dir 1e6x10.txt 17/05/2012 18:13 12,000,000 1e6x10.txt C:\test>perl -MDevel::Size=total_size -e"@a=<>; print total_size \@a" +1e6x10.txt 72000176

    The power and flexibility of Perl's arrays comes from their complex and flexible internal structure. The cost is the memory usage.

    There are often ways of storing data in Perl that substantially reduces that internal overhead whilst still giving you the access you need. You'd need to show us what the data consists of, and how you need to access it.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I see. Thanks for the expeditious reply. The array consists entirely of floating point numbers (including many "NaN"). I only need to access it in a foreach construct. I desperately need to reduce the memory it is using.
        I desperately need to reduce the memory it is using. ... I cannot easily change how it is constructed.

        If you cannot change how it is constructed, you cannot reduce the memory that will be used as it is constructed.

        You could reduce it after the fact by packing the doubles into a string:

        perl -MDevel::Size=total_size -le"@a= map rand(),1..1e6; print total_size \@a; $b=pack'd*', @a; prin +t total_size( $b )" 40000176 8000056

        That's a 5 to 1 reduction. It will free up memory for other things if you then undef'd the array, but it usually won't release memory back to the OS.

        And the individual elements can be accessed and modified use substr in conjunction with pack & unpack. The penalty is loops run more slowly. PDL would be much quicker.

        It would be much better if you could avoid the creation of the array in the first place


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      I should mention that I would prefer not to use PDL and that the array is constructed beforehand and I cannot easily change how it is constructed.