in reply to Removing redundancy

Well, the words "huge file" make me wonder if this is the right way to go, but if you want one entry per key, that's spelled "hash" in Perl. Here's a sample (untested) that should take you in the right direction:

my %hash; # crappy name but easy to remember while (<FILE>) # assuming FILE is open to the right place { my ($key, $value) = split /\t/; # only one tab per line push @{$hash{$key}}, $value; # magic of autovivification } # done, print everything foreach my $key (keys %hash) { print $key, " => ", join(' ', values @{$hash{$key}}, "\n"; }
The idea is to maintain a hash of the keys (your first column) and associate each with an array of values (your second column). The main problem this may face is that hashes take a lot of memory, so depending on your definition of "huge file", this may not work.

For more information, you may want to start with the Perl data structures cookbook.

Update
BrowserUK is correct below, hashes don't preserve input order. For that, the easiest modification is to use Tie::IxHash if you can take the time and memory hit.

Replies are listed 'Best First'.
Re: Re: Removing redundancy
by BrowserUk (Patriarch) on Apr 26, 2003 at 02:57 UTC

    This doesn't preserve the input order.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.