Load a file into hash with duplicate keys

sophix has asked for the wisdom of the Perl Monks concerning the following question:

Dear all

I would like to load two tab-delimited text files into two separate hashes. These two files are in the following formats:

File 1

NM_001127328    202502_at
NM_000018    200710_at
NM_000019    205412_at
NM_001111067    203935_at
NM_000023    210632_s_at
NM_000027    204332_s_at
NM_000027    204333_s_at
NM_000027    216064_s_at
NM_000029    202834_at
NM_000031    218487_at
NM_000031    218489_s_at
NM_000032    211560_s_at
NM_000036    206121_at
NM_000042    205216_s_at
NM_000043    204780_s_at
NM_000043    204781_s_at
NM_000043    215719_x_at
NM_000043    216252_x_at
NM_000044    211110_s_at
NM_000044    211621_at
[download]

File 2

204332_s_at    P    P    A    P    P    P    P
216064_s_at    P    M    P    P    P    A    A
211560_s_at    P    P    P    A    P    P    P
200003_s_at    P    A    P    P    P    A    P
211110_s_at    P    A    P    A    P    A    P
200005_at    P    P    A    P    P    P    A
[download]

First columns in both files serve as keys while the other columns are values. The problem with loading the first file into a hash is that it overwrites the values for the same keys. Is there a way to append the values for the same key hence creating single key - multiple values pairs?

Afterwards, I need to match each key in the first file with the each key in the second file through the values in the first file. I guess If I have a hash for the first file, then I can read through the second hash using three for loops.

Thank you,

Comment on Load a file into hash with duplicate keys Select or Download Code

Replies are listed 'Best First'.
Re: Load a file into hash with duplicate keys by CountZero (Bishop) on Nov 02, 2010 at 23:51 UTC
By definition, if the values in the first field in your first file are not unique, then that field cannot be a key. This gives you basically two options: either normalize your file in such a way that these values are unique (a HashOfArrays will do the trick), or make the second field your key field (provided these values are unique). CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re: Load a file into hash with duplicate keys by umasuresh (Hermit) on Nov 02, 2010 at 20:24 UTC
Please show us what you have tried so far. Try this Hash of Array `push @{ $hash{$key} }, $value;` [download]	[reply] [d/l]
Re^2: Load a file into hash with duplicate keys by sophix (Sexton) on Nov 02, 2010 at 20:38 UTC
Thank you for your replies. I had come across this script in the net, however, it repeats the key in the values. `#!/usr/bin/perl -w use Data::Dumper; my $data = '/Data/Table1.txt'; open INFILE, '<', $data or die "a!\n"; while(<INFILE>) { chomp; my $line = $_; my $key = (split/\t/, $line)[0]; push @{ $hash{$key} }, $line; } print Dumper(\%hash);` [download]	[reply] [d/l]
Re^3: Load a file into hash with duplicate keys by umasuresh (Hermit) on Nov 02, 2010 at 20:46 UTC
Show us the input file `Table1.txt`. Here you are splitting the input line based on tab delimiter and collecting the first element[0] in `$key` `my $key = (split/\t/, $line)[0];` [download] Here you are pushing every line that has the same key into an array `push @{ $hash{$key} }, $line;` [download] There can only be unique keys in such a data structure. UPDATE Read more... (3 kB)	[reply] [d/l] [select]
Re^4: Load a file into hash with duplicate keys by sophix (Sexton) on Nov 02, 2010 at 20:54 UTC
Re^5: Load a file into hash with duplicate keys by Anonymous Monk on Nov 02, 2010 at 21:02 UTC
Some notes below your chosen depth have not been shown here
Re: Load a file into hash with duplicate keys by Anonymous Monk on Nov 02, 2010 at 20:30 UTC
It sounds like you want a hash of arrays... `push @{ $hashOfStuff{$key} }, $newThing;` Or possibly a hash of hashes so you don't have to search the array... `$hashOfStuff{$key}{$newThing} = undef #That the key exists is sufficie +nt if (exists $hashOfStuff{$key}{$testThing}) { ... }` [download]	[reply] [d/l] [select]
Re: Load a file into hash with duplicate keys by Anonymous Monk on Nov 02, 2010 at 20:32 UTC
I need to match each key in the first file with the each key in the second file through the values in the first file Why don't you make what you call values (the second column in file 1) the keys of the hash instead? This way you could do a direct lookup without any unnecessary looping. That's what hashes are made for after all...	[reply]