in reply to Large file, multi dimensional hash - out of memory
You could reduce the memory requirement to around 1/4 by not using 2 levels of hash. A single level will do the job:
use strict; use warnings; open(my $fh, "<", "input.txt") or die "cannot open < input.txt: $!"; my %duplicates; while (my $line = <$fh>) { chomp $line; ++$duplicates{$line}; }
But that will still require around 4GB to build the 50e6 key hash. Better than 16GB, but you will still run out of memory if you are using a 32-bit Perl (unless you have a very high proportion of duplicates. eg. >50%)
As you say the data is presorted, investigate the uniq command
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Large file, multi dimensional hash - out of memory
by Anonymous Monk on May 15, 2013 at 14:53 UTC | |
by BrowserUk (Patriarch) on May 15, 2013 at 15:10 UTC |