asqwerty has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks
I'm trying to do a simple meta analysis in a few databases. The format of DBs is this:
CHR1 CHR2 SNP1 SNP2 OR_INT STAT P 17 18 rs9912311 rs9965425 0.9307 0.06328 0.8014 17 18 rs9912311 rs9963148 0.9307 0.06328 0.8014 17 18 rs9912311 rs9959874 0.9668 0.01788 0.8936 17 18 rs9912311 rs1893506 1.091 0.07564 0.7833 17 18 rs9912101 rs9965425 0.9003 0.1249 0.7238 17 18 rs9912101 rs9963148 0.9003 0.1249 0.7238 17 18 rs9912101 rs9959874 0.9507 0.0376 0.8462 17 18 rs9912101 rs1893506 1.029 0.007849 0.9294 17 18 rs9905581 rs9965425 0.9003 0.1249 0.7238
I have 5 DBs with around 30k lines each. So I write these lines,
use strict; use warnings; use File::Slurp qw(read_file); use Math::CDF qw(qnorm pnorm); use List::MoreUtils qw(uniq); my $ofile = "meta1.txt"; my @ifiles = @ARGV; my %ipairs; my @lpairs; foreach my $ifile (@ifiles){ (my $fk) = $ifile =~ /^(.*)\_sets.*/; my %ldata = reverse map {/^(.*(rs\d{1,20}\s+rs\d{1,20}).*)$/} grep + {/.*rs\d{1,20}\s+rs\d{1,20}.*/} read_file $ifile; foreach my $dline (sort keys %ldata){ push @lpairs, $dline; ($ipairs{$fk}{$dline}{'head'}, $ipairs{$fk}{$dline}{'effect'}, + $ipairs{$fk}{$dline}{'pvalue'}) = $ldata{$dline} =~ /^(.*)\s+(\d\.\d ++)\s+\d\.\d+\s+(\d\.\d+)$/; } } @lpairs = uniq @lpairs; open OF, ">$ofile"; my $head = "CHR1 CHR2 SNP1 SNP2 P N"; print OF "$head\n"; foreach my $pair (@lpairs) { my $n = 0; my $z = 0; my $hl; my $pvalue = 0; my $fk; foreach $fk (%ipairs) { if($ipairs{$fk}{$pair}{'pvalue'}){ unless($hl){ $hl = $ipairs{$fk}{$pair}{'head'}; } $n++; $z+= qnorm($ipairs{$fk}{$pair}{'pvalue'}) } } if($n>2){ $z = $z/sqrt($n); $pvalue = pnorm($z); } if ($pvalue) { #printf "$pair -> %.4f\n", $pvalue; printf OF "$hl %.4f $n\n", $pvalue; } } close OF;
Actually, the program works fine. However my problem is that it incrementally consumes memory until it gets the 32Mb. Finally the system kill the job by itself, so my program never finish.
So, I have two questions.
Why is this happening? The high memory waste begins after all the info is already loaded in the hash. In oder words, in the loop when calculations take place and results are writting to disk.
There is any workaround to sort this problem? Actually I was thinking in writing intermediate results to disks but I'm not yet sure how to do it.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: memory issues
by BrowserUk (Patriarch) on Jan 28, 2013 at 09:13 UTC | |
by asqwerty (Acolyte) on Jan 28, 2013 at 09:24 UTC | |
by Anonymous Monk on Jan 28, 2013 at 17:01 UTC | |
by asqwerty (Acolyte) on Jan 28, 2013 at 09:25 UTC | |
|
Re: memory issues
by Anonymous Monk on Jan 28, 2013 at 09:08 UTC |