in reply to Re: finding and deleting repeated lines in a file
in thread finding and deleting repeated lines in a file

A suggestion to avoid possible collisions: if CPU time is not a concern, using several different algorithms to create multiple fingerprints increases the improbability of a collision to astronomically high figures. Even using the same algorithm on the original string and a variant created by some transliteration rules to obtain multiple fingerprints will exponentially decrease the probability of collisions.

Makeshifts last the longest.

  • Comment on Re^2: finding and deleting repeated lines in a file