in reply to Re^2: Memory Efficient Alternatives to Hash of Array
in thread Memory Efficient Alternatives to Hash of Array

As for the use of a hash of arrays, reading the post I would assume a badly chosen data sample rather than a misunderstanding.

Given the OPs description of the code: "My code below, tries to group the 'error_rate' (second column of data) based on its corresponding tag (first column of data).", in conjunction with that the second column appears to be a byte-wise mask for the first:

AATACGGCCACCCCCCCCCCCCCCGCCCCTCCCC INILILFIIIIQNQQNQNLLKFKNCDHA?DAHHH

I don't think it is just badly chosen sample data. Maybe the OP will tell us which is correct?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: Memory Efficient Alternatives to Hash of Array
by tilly (Archbishop) on Dec 27, 2008 at 21:01 UTC
    From the OP's text and code I thought that the OP wanted to know all of the possible values for the second field for each possible value of the first field. Given that the first field repeats, this requires an array.
      You are exactly right, tilly.

      ---
      neversaint and everlastingly indebted.......

        I stand corrected.

        However, you're still better off using an external sort, as it allows you to gather the multiple values for each key together without loading the entire dataset into memory. Using a fairly simple loop like this:

        #! perl use strict; my( $key, @array ) = split "\t", <>; while( <> ) { chomp; my( $newKey, $value ) = split "\t"; if( $newKey eq $key ) { push @array, $value; next; } else { # Process @array for $key #... ## Remember the newKey $key = $nextKey; ## And the reset the array @array = $value; } }

        And a command line like:

        sort < unsortedFile | perl theScriptAbove

        Or just sort the file and then feed it to the script as separate steps:

        sort < unsortedFile > sortedFile perl theScriptAbove sortedFile

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.