in reply to Find duplicate lines from the file and write it into new file.

The traditional approach is to sort your file first, then all duplicates will appear one after another. To sort a huge amount of data, the best approach is to divide it up into small parts and sort these first and then do a Merge Sort from the sorted parts. During that merging you can also already output the duplicates easily.

  • Comment on Re: Find duplicate lines from the file and write it into new file.

Replies are listed 'Best First'.
Re^2: Find duplicate lines from the file and write it into new file.
by tinita (Parson) on Jan 04, 2007 at 13:28 UTC
    if sorting is an option, and we are on unix/linux here, i would try out sort -u file

    update: yep, i misread the question. forget it =)

      but then you would lose the duplicates, i think the OP wanted to find the duplicates, perhaps somthing along the lines of sort | uniq -c , then they could read the file and look at the counts



      This is not a Signature...
        uniq can give you the dups directly:
        sort foo.txt | uniq -d