Re: Find duplicate lines from the file and write it into new file.

The traditional approach is to sort your file first, then all duplicates will appear one after another. To sort a huge amount of data, the best approach is to divide it up into small parts and sort these first and then do a Merge Sort from the sorted parts. During that merging you can also already output the duplicates easily.

Comment on Re: Find duplicate lines from the file and write it into new file.

Replies are listed 'Best First'.
Re^2: Find duplicate lines from the file and write it into new file. by tinita (Parson) on Jan 04, 2007 at 13:28 UTC
if sorting is an option, and we are on unix/linux here, i would try out `sort -u file` update: yep, i misread the question. forget it =)	[reply] [d/l]
Re^3: Find duplicate lines from the file and write it into new file. by monkey_boy (Priest) on Jan 04, 2007 at 13:34 UTC
but then you would lose the duplicates, i think the OP wanted to find the duplicates, perhaps somthing along the lines of `sort \| uniq -c` , then they could read the file and look at the counts This is not a Signature...	[reply] [d/l] [select]
Re^4: Find duplicate lines from the file and write it into new file. by davidrw (Prior) on Jan 04, 2007 at 13:51 UTC
`uniq` can give you the dups directly: `sort foo.txt \| uniq -d` [download]	[reply] [d/l] [select]