comment on

G'day ZWcarp,

Firstly, you're doing yourself no favours whatsoever by littering your code with meaningless variable names. While you may remember, while you're writing it and it's fresh in your mind, what $r[3] or $key2 refer to, you won't next month when you have to come back to it to make a modification.

From the code and text you've provided: the keys of %AA are gene names (if I've got that right, $gene_name would be a meaningful name); the keys of %{$site{$gene_name}} are mutation sites (again $mutation_site would be meaningful); and so on throughout your code.

I don't see any purpose to any of the sorting you're doing in your "problem section" (it's wasted processing and chews up even more memory) and I agree with AM about the [$key4].

Putting all that together, I think your "problem section" could be written as (untested):

for my $gene_name (keys %AA) {
    for (keys %{$site{$gene_name}}) {
        $site_length_catch{$gene_name}{$_} = @{$site{$gene_name}{$_}};
    }
}
[download]

And later you can access that data as:

my $mutation_count = $site_length_catch{$gene_name}{$mutation_site};
[download]

There's other parts of your code that seem dubious (e.g. the my $key4=$key3; assignment) which perhaps will become obvious to you when you apply better names. You're working with very large amounts of data and loops nested three-deep: you need to keep all the code (but especially the innermost loops) as efficient as possible: go through your code and remove unnecessary assignments, sorting and so on.

-- Ken

In reply to Re: Memory issue with large cancer gene data structure by kcott
in thread Memory issue with large cancer gene data structure by ZWcarp

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.