in reply to Filtering very large files using Tie::File
For filtering duplicates, you need only to remember what elements you already have written to the file. You don't need Tie::File, just a loop that uses a hash to remember what lines with what keys have already been written to the file. If memory is still scarce, you can tie that hash to disk:
open my $in, '<', $infile or die "Couldn't open '$infile': $!"; open my $out, '>', $outfile or die "Couldn't create '$outfile': $!"; my %seen; while (<$in>) { my $key = $in; # change this to whatever key generation you need if (! $seen{ $key }++) { print $out $in; }; };
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Filtering very large files using Tie::File
by elef (Friar) on Nov 26, 2010 at 17:41 UTC | |
by Corion (Patriarch) on Nov 26, 2010 at 17:45 UTC | |
by elef (Friar) on Nov 26, 2010 at 18:59 UTC | |
by eyepopslikeamosquito (Archbishop) on Nov 26, 2010 at 20:41 UTC | |
by Corion (Patriarch) on Nov 26, 2010 at 20:56 UTC | |
| |
by talexb (Chancellor) on Nov 26, 2010 at 17:57 UTC | |
by elef (Friar) on Nov 26, 2010 at 18:48 UTC |