in reply to Memory issue with large cancer gene data structure
Not sure if I fully understand but I'd like to venture a guess what you want:
use strict; use warnings; my %site_length_catch; my $max = 0; foreach (@file) { chomp; my @r = split /\t/; # cleaning from your second loop $r[13] =~ s/\D\.\D([0-9]+)\D/$1/; $r[13] =~ s/(\*|\?|s\d+)//; $site_length_catch{$r[0]}{$r[13]}++; $max = $r[13]>$max?$r[13]:$max; } foreach my $gene (keys %site_length_catch) { print $site_length_catch{$gene}{$_} // 0, "\t" for 1..$max; print "\n"; }
The hash %site_length_catch is a sparse matrix containing the name of the gene as the first dimension and the site of mutations as the second dimension. Each cell in the matrix contains the number of mutations at that site for that gene.
When printing the empty spaces are filled with zeros (this is what // 0 does). I have added the regexes from your second loop as they seem to be applied to the "Mutation site". Just remove them if I have guessed wrongly.
Feedback would be appreciated, along with a few lines of your input, if possible.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Memory issue with large cancer gene data structure
by ZWcarp (Beadle) on Jul 25, 2013 at 20:27 UTC | |
by hdb (Monsignor) on Jul 26, 2013 at 07:27 UTC | |
by ZWcarp (Beadle) on Jul 30, 2013 at 18:36 UTC | |
by ZWcarp (Beadle) on Aug 08, 2013 at 15:29 UTC | |
by hdb (Monsignor) on Aug 21, 2013 at 13:06 UTC | |
by ZWcarp (Beadle) on Aug 08, 2013 at 15:30 UTC |