Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Uhm... let me think... file is huge, so it is not advisable to keep all the non-repeated sequences inside a hash, or your memory will blow. I think it would be a good idea to use a message digest on the values to keep memory occupation low (at the expense of some CPU, of course); this also exposes you to the risk of two different sequences having the same digest -the probability should be low, but not null... I have no data to test and I never used Digest::MD5 directly, so take the subsequent code as a suggestion -it may suit your needs or be completely wrong. I'm looking at the documentation on http://search.cpan.org/doc/GAAS/Digest-MD5-2.20/MD5.pm
If this not what you need, I hope that at least this can help you to reach the better solution. --brontoIn reply to Re: finding and deleting repeated lines in a file
by bronto
|
|