in reply to Re: File Manipulation - Need Advise!
in thread File Manipulation - Need Advise!
If you need unique across an entire set, no question that hashes are most useful. Problem, though, is that you have to then store all the keys.
It is not uncommon to want to dedup when there are successive runs (think unix's 'uniq'). That's when this second class comes into play. Set a state variable, and read one line at a time. You may have to keep around the previous line or two to compute your state. You may have to do some work at the end to clean up stored lines.
This is a big win when you have millions and millions of entries to sift through.my $thisKey; my $lastLine = <>; my $lastKey = ''; # first line is header, so always print while (<>) { if (/(.*?)\t.*/) { $thisKey = $1 } else { warn "bad data: $_ had no tab\n"; } if ($thisKey ne $lastKey) { print $lastLine; } $lastLine = $_; $lastKey = $thisKey; } print $lastLine;
|
|---|