Re: Memory Efficient Alternatives to Hash of Array

Hm. presumably, you've only used <DATA> by way of example, as Perl would die just trying to load the script if it was 4GB+ in size.

Next. Why are you using a HoAs? On the basis of what you've posted, you have one key and one value per key, so wrapping that one value in an array just uses ~50% more memory than needed!

That is, changing:push @{ $hold{$elem[0]} }, $elem[1];

to $hold{ $elem[0] } = $elem[1]; would contain the same information but use 50% less memory to do so.

But either way, you've still got too much data to hold in memory on a 32-bit machine, and (on the basis of your script(s to date)), as the only reason for loading it is to sort it, you'd be far better off sorting it (the input file) externally and processing it line by line.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: Memory Efficient Alternatives to Hash of Array Select or Download Code

Replies are listed 'Best First'.
Re^2: Memory Efficient Alternatives to Hash of Array by tilly (Archbishop) on Dec 27, 2008 at 14:56 UTC
FYI Perl stops processing when it sees __DATA__ so there would be no problem loading a script that is over 4 GB of size. As for the use of a hash of arrays, reading the post I would assume a badly chosen data sample rather than a misunderstanding. Update: Good catch, eye. The sample ws well chosen.	[reply]
Re^3: Memory Efficient Alternatives to Hash of Array by eye (Chaplain) on Dec 27, 2008 at 20:19 UTC
...I would assume a badly chosen data sample... Actually, the OP's example has three sets of duplicate tags: `Lines 6 - 9: TGATACGGCGACCACCGAGATCTACACTCTTTCC Lines 15 - 17: TGCTCCGGCGACCACCGAGATCTACACTCTTTCC Lines 19 - 20: TTCTCCTTCGACCACCGAGATCTACACTCTTTCC` [download]	[reply] [d/l]
Re^3: Memory Efficient Alternatives to Hash of Array by BrowserUk (Patriarch) on Dec 27, 2008 at 20:43 UTC
As for the use of a hash of arrays, reading the post I would assume a badly chosen data sample rather than a misunderstanding. Given the OPs description of the code: "My code below, tries to group the 'error_rate' (second column of data) based on its corresponding tag (first column of data).", in conjunction with that the second column appears to be a byte-wise mask for the first: `AATACGGCCACCCCCCCCCCCCCCGCCCCTCCCC INILILFIIIIQNQQNQNLLKFKNCDHA?DAHHH` [download] I don't think it is just badly chosen sample data. Maybe the OP will tell us which is correct? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: Memory Efficient Alternatives to Hash of Array by tilly (Archbishop) on Dec 27, 2008 at 21:01 UTC
From the OP's text and code I thought that the OP wanted to know all of the possible values for the second field for each possible value of the first field. Given that the first field repeats, this requires an array.	[reply]
Re^5: Memory Efficient Alternatives to Hash of Array by neversaint (Deacon) on Dec 28, 2008 at 00:34 UTC
Re^6: Memory Efficient Alternatives to Hash of Array by BrowserUk (Patriarch) on Dec 28, 2008 at 01:05 UTC
Some notes below your chosen depth have not been shown here