That could be a hard problem, depending on what you need the filtered data for. My usual method is to use a $hash{$line}++ type method to find dupes, but that's going to eat a lot of ram (the problem you're having now I guess) unless the lines have some identifier that you can use instead.
One option might be to build a $digest = Digest::MD5->new() and count the digests. That could be really computationally expensive, especially for a file that big. I guess it depends what the lines look like. Would it be possible to include a few sample lines?
-Paul
In reply to Re: Find duplicate lines from the file and write it into new file.
by jettero
in thread Find duplicate lines from the file and write it into new file.
by anna_here
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |