in reply to Re^3: removing duplicates lines plus strings from a file
in thread removing duplicates lines plus strings from a file

thank you for your comments ... it works perfectly....just wanted to ask one more thing.. is it possible do the same using regex matching... will it be able to save the original ordering of the "Lines" that is lost by using Hash..

  • Comment on Re^4: removing duplicates lines plus strings from a file

Replies are listed 'Best First'.
Re^5: removing duplicates lines plus strings from a file
by graff (Chancellor) on Sep 19, 2011 at 10:53 UTC
    You would use regex matching if you were trying to spot "partial duplications" in the column of interest, but then you have a lot more work to do to specify what qualifies as "duplication", and to condition the data to meet the spec -- e.g. if you want "foo@http://url1@bar" to be considered a duplicate of "foo@https://url1@bar2", then you would use a regex to eliminate the irrelevant differences.

    As for preserving original line ordering, the "col-uniq" utility does that, at the expense of preserving duplicate values when the input hasn't been sorted in advance according to the column of interest.