Juba has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a code to re arrange a two column file into a 4 column file. It turns out that when I use a large input the program crashes and spits the following message:

. *** malloc: vm_allocate(size=262144) failed with 3<br> *** malloc[591]: error: Can't allocate region<br> Out of memory!

The whole purpose is to randomize elements within each list of a huge list of lists and then pair within a list two by two elements.

My guess is that the problem is in the last loop of the following code. Could somebody please help me solving this problem? I ruan perl in MACOX 10.2.8 and , G4, 768 MB ram.

Thanks

@total = (); #this is an array of arrays @haplotypes=(); $currentPop = 0; $sampleSize={}; # This will hold the sample size for each population + while(<IN>) { chomp; if (/^(\-{0,1}\d+\t-{0,1}\d+)/) { $sampleSize{$currentPop}++; @temp = split; push @{$haplotypes{$currentPop}}, [@temp];<p> } elsif (/segsites: (\d+)/) { $currentPop++; push @total, $currentPop; } } foreach $_ (@total) { for($i = 0; $i < $sampleSize{$currentPop}; $i++){ $Rand=rand(int(@{$haplotypes{$_}})); $temp=splice (@{$haplotypes{$_}}, $Rand, 1); push @{$RandHapl{$_}}, $temp; } }

20040804 Janitored by Corion: Put code tags around code, reduced indentation

20040804 Janitored by davido: Put code tags around error message (necessary due to square brackets).

Replies are listed 'Best First'.
Re: memory leak
by BrowserUk (Patriarch) on Aug 04, 2004 at 15:45 UTC

    Besides all the other typos and anomolies in your code, the main reason your blowing your memory is because of this line:

    for($i = 0; $i < $sampleSize; $i++){

    At the top of the program you are making this a reference to an anonymous hash:

    $sampleSize={}; # This will hold the sample size for each population

    But in the for loop you are using the numeric value of that hashref as the upper bound of your loop.

    As the numeric value is likely to be something like 26596960, that means that your loop is going to try and create an array of 26 million elements, which would occupy at least 600 MB, for every element in @total.

    Your program code makes no sense.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      Thanks, this is actually just a slice of the code. It works anyway. You were right, I just needed to use $sampleSize{$currentPop} instead of $sampleSize and it worked. Thanks again.

        I wish I could say "if you were using use strict; use warnings; it would have picked this up", but I can't because it wouldn't have.

        None the less, it would be a good idea anyway :)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: memory leak
by rsteinke (Scribe) on Aug 04, 2004 at 15:27 UTC

    You can get around the memory allocation in the second loop by printing the lines just before the end of the outside loop, instead of storing them in memory, as

    foreach $_ (@total) { my @line_elems; for($i = 0; $i < $sampleSize; $i++){ ... push @line_elems, $temp; } print (join "\t", @line_elems) ."\n"; }
    instead of adding them to the ever-growing %RandHapl hash.

    The first loop, however,still allocates allocates enough memory to hold all the tokens in the file in a hash. This will grow as the file size does, leading to out-of-memory errors.You could fix this by moving the contents of the second loop inside the 'elsif' block, since both loops iterate on $currentPop, and storing the tokens in a temporary array instead of in %haplotypes.

    BTW, I notice that you declare the @haplotypes array at the top, but actually use the %haplotypes hash. Please consider using 'use strict;' and 'my' declarations, as they would prevent this kind of error.

    Ron Steinke rsteinke@w-link.net
Re: memory leak
by VSarkiss (Monsignor) on Aug 04, 2004 at 15:24 UTC

    It's not clear what you're trying to accomplish, and your data structures are connected in a strange way that makes it hard to tell. You wrote at the top that you're trying to arrange a two-column file into four columns, but the sample code seems to have no relation to that task.

    The code itself has little to guide the reader, and of the two comments, this one is incorrect: @total = (); #this is an array of arraysThe way you use it, it's an array of scalars.

    As for the memory problem itself, this line is likely to be the culprit:  push @{$RandHapl{$_}}, $temp;This is the first time %RandHapl appears in your code, and you're pushing random elements out of other arrays into arrays hanging from it.

    I say "likely to be" because I'm not sure. I also don't know how to tell you to fix it, because I don't know what you're trying to do. If you can add a reply to your original note with some details about the task at hand, you're much more likely to get a good answer.

    Update
    Added some clarification.

Re: memory leak
by Juba (Initiate) on Aug 04, 2004 at 15:56 UTC
    More details. The infile is actually a list of lists having each sublist a header and two numbers separated by tab. The first loop reads the infile and creates these little arrays associated with each sublist that I called $currentPop. The second loop goes through each array of @ total, splices one randomly chosen element and pushes into @{$RandHapl{$_}}. I then use this following code for printing a new file.

    foreach $Population (@total) { print "Population $Population\n"; print OUT "Population $Population\n"; print "$sampleSize{$Population}\n"; for($i = 0; $i < $sampleSize{$Population}/2; $i++) { @pair = splice @{$RandHapl{$Population}}, 0, 2; #print "[@pair]\n"; print "$pair[0][0]\t$pair[1][0]\t$pair[0][1]\t$pair[1][1]\t\n" +; print OUT "$pair[0][0]\t$pair[1][0]\t$pair[0][1]\t$pair[1][1]\ +t\n"; } }

    Janitored by davido: Added code tags. Removed br tags from code.