in reply to Removing redundancy
Well, the words "huge file" make me wonder if this is the right way to go, but if you want one entry per key, that's spelled "hash" in Perl. Here's a sample (untested) that should take you in the right direction:
The idea is to maintain a hash of the keys (your first column) and associate each with an array of values (your second column). The main problem this may face is that hashes take a lot of memory, so depending on your definition of "huge file", this may not work.my %hash; # crappy name but easy to remember while (<FILE>) # assuming FILE is open to the right place { my ($key, $value) = split /\t/; # only one tab per line push @{$hash{$key}}, $value; # magic of autovivification } # done, print everything foreach my $key (keys %hash) { print $key, " => ", join(' ', values @{$hash{$key}}, "\n"; }
For more information, you may want to start with the Perl data structures cookbook.
Update
BrowserUK is correct below, hashes don't preserve input order. For that, the easiest modification is to use Tie::IxHash if you can take the time and memory hit.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Removing redundancy
by BrowserUk (Patriarch) on Apr 26, 2003 at 02:57 UTC |