in reply to Duplicate entries?

Another way to remove duplicates is to just use the command line sort. Command line sort is not limited to having the entire file memory resident and can sort a HUGE file. Then cycle through that sorted file and don't output lines if the current line matched the immediately preceding line.

Replies are listed 'Best First'.
Re^2: Duplicate entries?
by johngg (Canon) on Jan 11, 2012 at 22:08 UTC

    If on *nix you can pipe the sort output into uniq (http://en.wikipedia.org/wiki/Uniq) to get rid of adjacent duplicates.

    knoppix@Microknoppix:~$ cat rubbish cat fish dog apple cat bird knoppix@Microknoppix:~$ sort rubbish | uniq apple bird cat dog fish knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG