in reply to Re: Filtering very large files using Tie::File
in thread Filtering very large files using Tie::File
open (ALIGNED, "<:encoding(UTF-8)", "${filename}.txt") or die "Can't o +pen aligned file for reading: $!"; open (ALIGNED_MOD, ">:encoding(UTF-8)", "${filename}_mod.txt") or die +"Can't open file for writing: $!"; if ($delete_dupes eq "y") { my %seen; # hash that contains uique records (hash lookups +are faster than array lookups) my $key; # key to be put in hash while (<ALIGNED>) { /^([^\t]*\t[^\t]*)/; # only watch first two fields chomp ($key = $1); # only watch first two fields print ALIGNED_MOD $_ if (! $seen{ $key }++); # add to hash, an +d if new, print to file } my $unfiltered_number = $.; my $filtered_number = keys %seen; print "\n\n-------------------------------------------------"; print "\n\nSegment numbers before and after filtering out dupes: $ +unfiltered_number -> $filtered_number\n"; print LOG "\nFiltered out dupes: $unfiltered_number -> $filtered_n +umber"; undef %seen; # free up memory close ALIGNED; close ALIGNED_MOD; rename ("${filename}_mod.txt", "${filename}.txt") or die "Can't re +name file: $!"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Filtering very large files using Tie::File
by ikegami (Patriarch) on Nov 27, 2010 at 20:54 UTC | |
by elef (Friar) on Nov 27, 2010 at 22:18 UTC | |
by ikegami (Patriarch) on Nov 27, 2010 at 22:36 UTC |