Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

My apologies in advance for such a simple question. I noticed a similar posting on removing redundancy - tried to modify it to work for my scenario but can't quite get it to work.

I have a file in the following tab-delimited format:
text1 text-a text-a text-b text-a
text1 text-c
text2 text-a text-b
text2 text-b text-d
etc...
I need to combine all data for "text1","text2", etc. while at the same time removing it's redundancy. So the output for this file will look like this:
text1 text-a text-b text-c
text2 text-a text-b text-d

Hopefully this makes sense and many thanks in advance for your help.
freddie

  • Comment on Removing redundancy in X and Y directions

Replies are listed 'Best First'.
Re: Removing redundancy in X and Y directions
by kyle (Abbot) on Sep 26, 2008 at 15:17 UTC

    I'd probably use a hash of hashes for this (see perldsc). The top level keys would be your "text1" and "text2". Second level keys would be "text-a", "text-b", etc. The values would just be 1 or something. When you're done, loop over the keys of the top level, and in the loop, get the keys of the second level. It may help to know what's in perlreftut and perlref.

    If you get stuck, post the code you have, and we can help further.

      The code that I was trying to modify was...
      my %hash;
      while(<DATA>){
      chomp;
      my @line = split /\t/;
      my $first = shift @line;
      push( @{$hash{$first}}, @line );
      }
      foreach( sort keys %hash ){
      print OUT "$_\t".join("\t", sort @{$hash{$_}})."\n";
      }
        my %hash; while(<DATA>){ chomp; my @line = split /\t/; my $first = shift @line; $hash{$first}{$_}++ for @line; } foreach( sort keys %hash ){ print STDOUT "$_\t".join("\t", sort keys %{$hash{$_}})."\n"; } __DATA__ text1 text-a text-a text-b text-a text1 text-c text2 text-a text-b text2 text-b text-d

        All I changed was the push line and the print line. Output:

        text1 text-a text-b text-c text2 text-a text-b text-d
Re: Removing redundancy in X and Y directions
by JavaFan (Canon) on Sep 26, 2008 at 15:53 UTC
    Something like the following, untested, code:
    my @top_level; my %seen; my @cache; while (<>) { chomp; my ($key, @thingies) = split /\t/; unless ($seen{$key}) { push @top_level, $key; $seen{$key} = {}; } push @{$cache{$key}}, grep {!$seen{$key}{$_}++} @thingies; } foreach my $key (@top_level) { say join "\t", $key, @{$cache{$key}}; }
      Thanks Guys for your excellent code!
      Freddie