in reply to Re^5: Memory Efficient Alternatives to Hash of Array
in thread Memory Efficient Alternatives to Hash of Array
I stand corrected.
However, you're still better off using an external sort, as it allows you to gather the multiple values for each key together without loading the entire dataset into memory. Using a fairly simple loop like this:
#! perl use strict; my( $key, @array ) = split "\t", <>; while( <> ) { chomp; my( $newKey, $value ) = split "\t"; if( $newKey eq $key ) { push @array, $value; next; } else { # Process @array for $key #... ## Remember the newKey $key = $nextKey; ## And the reset the array @array = $value; } }
And a command line like:
sort < unsortedFile | perl theScriptAbove
Or just sort the file and then feed it to the script as separate steps:
sort < unsortedFile > sortedFile perl theScriptAbove sortedFile
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: Memory Efficient Alternatives to Hash of Array
by tilly (Archbishop) on Dec 28, 2008 at 02:30 UTC | |
by BrowserUk (Patriarch) on Dec 28, 2008 at 02:45 UTC |