in reply to Removing duplicate entries in a file which has a time stamp on each line
However the IPaddress and Action part of each line may contain duplicates, its these duplicates I want to remove but still keep the output in time order.Now, if "1.2.3.4 PowerOff" occurs today at 08:18 and again today at 10:20, do you want to keep the first record and delete the later one, or vice versa?
If you keep the first and delete later repeats, you just keep the IP/Action data as hash keys, and assuming the data are being read in chronological order, only output lines whose IP/Action are not yet in the hash.
In order to delete earlier occurrences and keep only the latest one, you have to store Date/Time as the value for each IP/Action key, and after you've read the whole input stream, sort the hash by its values in order to print each "hash_value hash_key" in chronological order.
|
|---|