in reply to Out of Memory selecting from MySQL
Because structured data carries a substantial amount of overhead.
use Devel::Size;; push @a, [(undef)x9] for 1 .. 7022790;; print Devel::Size::size( \@a );; 33554472
That shows that an AoA containing 7022790 arrays each with 9 (undef) elements already consumes 33 MB of ram. Actually rather more, but when I try to use Devel::Size::total_size() on it, it blows my memory limits. That because total_size() itself consumes a large amount of memory in which to build a table of references so as to ensure that it doesn't count the same memory twice.
In addition to that 33 MB, you have 7022790 x 9 = 63205110 scalars, each with (on average) 5 bytes of data + (at least) 12 bytes of overhead. That's 63205110 x 17 = 1074486870 ~= 1GB.
But, as the array is being built, a certain amount of space is allocated to the array to start with. When that space is filled, the array must be extended, so a new lump of space (I think) twice the size the existing array is allocated, then existing data is copied over and the old one is freed. The new array then continues to be filled until it also reached it limits, at which point the process is repeated.
The problem is, that the first lump of space is not large enough to accomodate the 3rd generation request (which now requires 4 times the original) so still more space must be requested from the OS. And when that fills the process repeats again. And again, none of the existing, freed spaces is large enough to accomodate the new request so it is back to the OS.
Each time, the previous iterations space is not large enough to accomodate the next, so new space must be allocated. Whilst some of that freed space will be re-used for the small arrays and the scalars they contain, it fairly easy to see that as the process goes on, less and less of each previous iterations space is need to fulfill these smaller allocations and so you end up with large lumps of unused freespace until you finally reach the 4GB limit.
The solution may be fairly simple. Preallocate the final size of the large array up front. Issue the same query but with a count(*) clause to find out how many rows you are going to get and then use that to preallocate the array:
my @my_arr; $#my_arr = 7022790; ...
That should avoid the doubling and redoubling of the array allocation and so may allow you to load the entire dataset.
|
---|