G'day ZWcarp,
Firstly, you're doing yourself no favours whatsoever by littering your code with meaningless variable names. While you may remember, while you're writing it and it's fresh in your mind, what $r[3] or $key2 refer to, you won't next month when you have to come back to it to make a modification.
From the code and text you've provided: the keys of %AA are gene names (if I've got that right, $gene_name would be a meaningful name); the keys of %{$site{$gene_name}} are mutation sites (again $mutation_site would be meaningful); and so on throughout your code.
I don't see any purpose to any of the sorting you're doing in your "problem section" (it's wasted processing and chews up even more memory) and I agree with AM about the [$key4].
Putting all that together, I think your "problem section" could be written as (untested):
for my $gene_name (keys %AA) { for (keys %{$site{$gene_name}}) { $site_length_catch{$gene_name}{$_} = @{$site{$gene_name}{$_}}; } }
And later you can access that data as:
my $mutation_count = $site_length_catch{$gene_name}{$mutation_site};
There's other parts of your code that seem dubious (e.g. the my $key4=$key3; assignment) which perhaps will become obvious to you when you apply better names. You're working with very large amounts of data and loops nested three-deep: you need to keep all the code (but especially the innermost loops) as efficient as possible: go through your code and remove unnecessary assignments, sorting and so on.
-- Ken
In reply to Re: Memory issue with large cancer gene data structure
by kcott
in thread Memory issue with large cancer gene data structure
by ZWcarp
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |