Re: print out duplicate records

Perhaps this would do what you want:

use strict;

open( CDR, $CDR ) || die "Can't read $CDR\n";
open( DUP, ">$CDR.dup" ) || die "Can't write $CDR.dup\n";
open( UNQ, ">$CDR.unq" ) || die "Can't write $CDR.unq\n";
my %seen;

while (<CDR>) {
    if ( exists( $seen{$_} )) {
        print DUP;
    }
    else {
        $seen{$_}++;
        print UNQ;
    }
}
[download]

Having all the unique data lines in a hash in memory shouldn't be a problem for the size of files you mentioned.

Note that the numbe of lines in the "dup" file plus the number of lines in the "unq" file should sum to the number of lines in the input file.

Comment on Re: print out duplicate records Download Code