sophix has asked for the wisdom of the Perl Monks concerning the following question:

Dear all

I would like to load two tab-delimited text files into two separate hashes. These two files are in the following formats:

File 1

NM_001127328 202502_at NM_000018 200710_at NM_000019 205412_at NM_001111067 203935_at NM_000023 210632_s_at NM_000027 204332_s_at NM_000027 204333_s_at NM_000027 216064_s_at NM_000029 202834_at NM_000031 218487_at NM_000031 218489_s_at NM_000032 211560_s_at NM_000036 206121_at NM_000042 205216_s_at NM_000043 204780_s_at NM_000043 204781_s_at NM_000043 215719_x_at NM_000043 216252_x_at NM_000044 211110_s_at NM_000044 211621_at

File 2

204332_s_at P P A P P P P 216064_s_at P M P P P A A 211560_s_at P P P A P P P 200003_s_at P A P P P A P 211110_s_at P A P A P A P 200005_at P P A P P P A

First columns in both files serve as keys while the other columns are values. The problem with loading the first file into a hash is that it overwrites the values for the same keys. Is there a way to append the values for the same key hence creating single key - multiple values pairs?

Afterwards, I need to match each key in the first file with the each key in the second file through the values in the first file. I guess If I have a hash for the first file, then I can read through the second hash using three for loops.

Thank you,

Replies are listed 'Best First'.
Re: Load a file into hash with duplicate keys
by CountZero (Bishop) on Nov 02, 2010 at 23:51 UTC
    By definition, if the values in the first field in your first file are not unique, then that field cannot be a key.

    This gives you basically two options: either normalize your file in such a way that these values are unique (a HashOfArrays will do the trick), or make the second field your key field (provided these values are unique).

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Load a file into hash with duplicate keys
by umasuresh (Hermit) on Nov 02, 2010 at 20:24 UTC
    Please show us what you have tried so far.
    Try this Hash of Array
    push @{ $hash{$key} }, $value;
      Thank you for your replies.

      I had come across this script in the net, however, it repeats the key in the values.

      #!/usr/bin/perl -w use Data::Dumper; my $data = '/Data/Table1.txt'; open INFILE, '<', $data or die "a!\n"; while(<INFILE>) { chomp; my $line = $_; my $key = (split/\t/, $line)[0]; push @{ $hash{$key} }, $line; } print Dumper(\%hash);
        Show us the input file Table1.txt.
        Here you are splitting the input line based on tab delimiter and collecting the first element[0] in  $key
        my $key = (split/\t/, $line)[0];
        Here you are pushing every line that has the same key into an array
        push @{ $hash{$key} }, $line;
        There can only be unique keys in such a data structure. UPDATE
Re: Load a file into hash with duplicate keys
by Anonymous Monk on Nov 02, 2010 at 20:30 UTC

    It sounds like you want a hash of arrays...

    push @{ $hashOfStuff{$key} }, $newThing;

    Or possibly a hash of hashes so you don't have to search the array...

    $hashOfStuff{$key}{$newThing} = undef #That the key exists is sufficie +nt if (exists $hashOfStuff{$key}{$testThing}) { ... }
Re: Load a file into hash with duplicate keys
by Anonymous Monk on Nov 02, 2010 at 20:32 UTC
    I need to match each key in the first file with the each key in the second file through the values in the first file
    Why don't you make what you call values (the second column in file 1) the keys of the hash instead? This way you could do a direct lookup without any unnecessary looping. That's what hashes are made for after all...