I have adopted some suggestions that were given before. My code still blows off the memory when I use a very large infile. Can somebody please help me in optimizing this? Thanks a lot
#!/usr/bin/perl -w #This code joins randomly haplotypes two by two from the output given +by Eli's program with an arbitrary number of populations. It needs a +2 allele output. It is the infile for R calculation. if ( @ARGV != 1 ) { print "incorrect usage ---- TYPE IN COMMAND LINE: perl eli+.pl inf +ile\n"; exit(); } print "What's the OUTFILE?\n"; $OUTFILE = <STDIN>; open (OUT, ">$OUTFILE") or die "could not create $OUTFILE\n"; open (IN, $ARGV[0]); #open the first argument #transform the input list in an array of arrays @total = (); @haplotypes=(); $currentPop = 0; $sampleSize={}; # This will hold the sample size for each population + while(<IN>) { chomp; if (/^(\-{0,1}\d+\t-{0,1}\d+)/) { $sampleSize{$currentPop}++; @temp = split; push @{$haplotypes{$currentPop}}, [@temp]; }elsif (/segsites: (\d+)/) { @{$TempList{$currentPop}}=(); while(@{$haplotypes{$currentPop}}) { push(@{$TempList{$currentPop}}, + splice(@{$haplotypes{$currentPop}}, rand(@{$haplotypes{$currentPop}}), 1)) + } @{$haplotypes{$currentPop}} = @{$TempList{$currentPop}}; push @total, $currentPop; $currentPop++; + } + } foreach $_ (@total) { print "Population $_\n"; print OUT "Population $_\n"; print "$sampleSize{$_}\n"; for($i = 0; $i < $sampleSize{$_}/2; $i++) { @pair =(); @pair = splice @{$haplotypes{$_}}, 0, 2; print "$pair[0][0]\t$pair[1][0]\t$pair[0][1]\t$pair[1][1]\t\n"; print OUT "$pair[0][0]\t$pair[1][0]\t$pair[0][1]\t$pair[1][1]\t\n"; + } + } print "$currentPop\n";
The infile looks like this with hundreds of thousands of repeats

segsites: 2 tMRCA: 0.6398829 Fsense 1 founders: 2 -4 0 -4 0 -1 1 -4 0 -segsites: 2 tMRCA: 0.3395337 Fsense 1 founders: 2 0 0 0 0 -3 -1 0 0

In reply to still memory leak by Juba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.