in reply to finding and deleting repeated lines in a file
If order isn't important, ou can sort the file and then make a script that just removes consecutive repeated lines. This will save memory if your sort can work within the memory limitations.
If order is important and the lines are quite large you can use Digest::MD5 to create a checksum of each line and then use the array of checksums to compare all the lines of the file. This will save some memory.
I risk repeating what's already been said, but I think the previous posts were dancing around the issue.
|
|---|