in reply to Re: sort large file
in thread sort large file

This was my first inclination as well -- note that you don't have to sort the file first if you do it this way. A potential downside is that you wind up with the entire data structure in memory.

One option to avoid the memory consumption is to use something like DBD::SQLite as mentioned above. Another option (probably slower and bulkier, but not requiring knowing DBI and SQL) might be to use Tie::MLDBM to store that hash on disk as a DB_File rather than in memory. That might wind up being larger than the original text file, but at least it'll be easy to access it again in perl.

Note: if you use Tie::MLDBM, you'll need to extract the array from the hash to a temporary array, push your data to the temporary array, and then store the temporary array back in the hash. Read the pod for more details. E.g.:

# Code not tested - consider it conceptual # use strict; use warnings; use Tie::MLDBM; tie my %hash, 'Tie::MLDBM', { 'Serialise' => 'Storable', 'Store' => 'DB_File' }, 'mybighash.dbm', O_CREAT|O_RDWR, 0640 or die $!; while (<>) { chomp; my ($id, $data) = /(\d+)\s+(.*)/; my $aref = $hash{$id} || []; push @{$aref}, $data; $hash{$id} = $aref; }

Good luck!

-xdg

Code posted by xdg on PerlMonks is public domain. It has no warranties, express or implied. Posted code may not have been tested. Use at your own risk.