in reply to Re^4: removing duplicates lines plus strings from a file
in thread removing duplicates lines plus strings from a file

You would use regex matching if you were trying to spot "partial duplications" in the column of interest, but then you have a lot more work to do to specify what qualifies as "duplication", and to condition the data to meet the spec -- e.g. if you want "foo@http://url1@bar" to be considered a duplicate of "foo@https://url1@bar2", then you would use a regex to eliminate the irrelevant differences.

As for preserving original line ordering, the "col-uniq" utility does that, at the expense of preserving duplicate values when the input hasn't been sorted in advance according to the column of interest.

  • Comment on Re^5: removing duplicates lines plus strings from a file