in reply to Find duplicate lines from the file and write it into new file.
That could be a hard problem, depending on what you need the filtered data for. My usual method is to use a $hash{$line}++ type method to find dupes, but that's going to eat a lot of ram (the problem you're having now I guess) unless the lines have some identifier that you can use instead.
One option might be to build a $digest = Digest::MD5->new() and count the digests. That could be really computationally expensive, especially for a file that big. I guess it depends what the lines look like. Would it be possible to include a few sample lines?
-Paul
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Find duplicate lines from the file and write it into new file.
by gaal (Parson) on Jan 04, 2007 at 14:07 UTC | |
by jettero (Monsignor) on Jan 04, 2007 at 14:10 UTC |