in reply to Re: Memory issue with large cancer gene data structure
in thread Memory issue with large cancer gene data structure

$key4 = the amino acid position. It is derived from the first file and edited to remove non digits so that its just a numerical position

cell $key4 is originally initialized as 0 and at this step it needs to be changed to the number of mutations at this site (the array length corresponds to the gene length of gene $key1 so each cell corresponds to an amino acid position in the gene). It is set equal to scalar ( @{$site{$key1}{$key4}} ); because this calculates the number of values at a mutation site, thus the size of this essentially opperates as a count for recurrent positions across different samples. The numbers for this are mostly 0-10 there are a few large like one 35K and a few below that.