Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: finding and deleting repeated lines in a file

by caedes (Pilgrim)
on Jun 19, 2002 at 14:51 UTC ( [id://175707]=note: print w/replies, xml ) Need Help??


in reply to finding and deleting repeated lines in a file

If the file is really huge and your can't simply slurp the lines into a huge hash as the keys then you have two choices depending on whether or not the order of lines in the file is important.

If order isn't important, ou can sort the file and then make a script that just removes consecutive repeated lines. This will save memory if your sort can work within the memory limitations.

If order is important and the lines are quite large you can use Digest::MD5 to create a checksum of each line and then use the array of checksums to compare all the lines of the file. This will save some memory.

I risk repeating what's already been said, but I think the previous posts were dancing around the issue.

  • Comment on Re: finding and deleting repeated lines in a file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://175707]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-25 08:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found