Hi Monks,
I have input which is 500M pairs of 32bit integers, which don't occur more than 256 times each.
I want to load them into memory, and also have an array of how many times each int is in a pair.
Now in C, to do this I only need 5GB of ram: For the pairs: 500M * 4 (32bit int) * 2 (pairs) = 4GB. For the occurances: 1G * 1 (8bit int) = 1GB.
However when I do the same thing in perl, the ram usage is more like 256 bytes per item:
my @d; my @p; for (my $i=0;$i<500000000;$i++) { my ($p1,$p2) = getPair(); $d[$p1] += 1; $d[$p2] += 1; push @p, [ $p1, $p2]; }
I am seeing about 1GB RAM usage per 4M input pairs, so I would need 125G of RAM!
Is there any way you can tell perl a scalar is to be a int only of a certain size?
The other idea I have is that I should not be using array, but rather a gigantic scalar, and then pack/unpack the values in/out of the scalar?
What would be the best way to approach this in perl?
Thanks!
In reply to Memory efficient way to deal with really large arrays? by sectokia
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |