Re: Delete duplicate data in file

I don't really like having to depend on the files being sorted. One alternative way to remove duplicate data is to use a hash to temporarily hold your data. You can read in the data from the files, place it in a hash, and then (eventually) write it back out again. Since hashes depend on a unique key, you'll overwrite any prior duplicate data rows in the hash and end up with only one copy of each unique element.

Comment on Re: Delete duplicate data in file

Replies are listed 'Best First'.
Re^2: Delete duplicate data in file by pg (Canon) on Nov 21, 2005 at 05:05 UTC
It is not whether you "depend on the files being sorted", but whether it is a fact that the file is sorted. If it is (for example it could be some sort of log), to hold the entire file in momery is then a waste.	[reply]