in reply to print out duplicate records
use strict; open( CDR, $CDR ) || die "Can't read $CDR\n"; open( DUP, ">$CDR.dup" ) || die "Can't write $CDR.dup\n"; open( UNQ, ">$CDR.unq" ) || die "Can't write $CDR.unq\n"; my %seen; while (<CDR>) { if ( exists( $seen{$_} )) { print DUP; } else { $seen{$_}++; print UNQ; } }
Having all the unique data lines in a hash in memory shouldn't be a problem for the size of files you mentioned.
Note that the numbe of lines in the "dup" file plus the number of lines in the "unq" file should sum to the number of lines in the input file.
|
|---|