Re: how to compare two hashes with perl?

Replies are listed 'Best First'.
Re^2: how to compare two hashes with perl? by FluffyBunny (Acolyte) on Nov 04, 2009 at 21:01 UTC
Thank you for your reply. IDs might not be in the same order that's why I'm looking for a certain ID I have in file 1 to match with any ID in file 2... This is what I wanted to check basically. 1)Check ID names. 2)If they match, and the sequences match, do not print. 3)If they match, but the sequences do not match, print both ID and the sequences from each file. 4)If they dont match, print both ID and the sequences from each file. I'm a newbie, and I'm trying to understand hash.. it's just confusing and I'm not exactly sure how my file gets stored in hash. I hear hash is random when it prints output and I want my ID doesn't get mixed with wrong sequences (an ID uniquely corresponds to each sequence). I updated the original post with my output and input files. Thank you!	[reply]
Re^3: how to compare two hashes with perl? by 7stud (Deacon) on Nov 04, 2009 at 23:22 UTC
I hear hash is random when it prints output That just means that the order in which you add key/value pairs to a hash is not the order in which they are stored in the hash. Here is an example: `use strict; use warnings; $\ = "\n"; $, = ', '; my %hash = (); $hash{"h"} = 10; $hash{"z"} = 20; $hash{"a"} = 30; foreach my $key (keys %hash) { print "$key: $hash{$key}"; } --output:-- a: 30 h: 10 z: 20` [download] However, the key/value pairs are the same. A key will never be associated with a value that you did not enter for that key. it's just confusing and I'm not exactly sure how my file gets stored in hash Take a look at this example: `use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); my $line = 'HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC'; my @pieces = split /\s+/, $line; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = $seq; foreach my $key (keys %results) { print "$key -----> $results{$key}"; } --output:-- HWUSI-EAS548:7:1:5:1527#0/1 -----> CGGAGC` [download] If you want to gather all the sequences corresponding to an id, you can do this: `use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; } foreach my $key (keys %results) { my $arr_str = join ',', @{$results{$key}}; print "$key -----> [$arr_str]"; } __DATA__ HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 XXXXXX Some_other_id + chr12 52084152 CGGAGC` [download] You might want to experiment a little more with hashes in a separate practice program. For instance, you might want to read perlintro and perldsc, which you can read by typing: `$ man perlintro or $ man perdsc` [download] For a complete list of topics available type: `$man perl` [download] and scroll down.	[reply] [d/l] [select]
Re^4: how to compare two hashes with perl? by 7stud (Deacon) on Nov 04, 2009 at 23:36 UTC
`while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; }` [download] Actually, as perlreftut instructs, the line: `$results{$id} = [] unless exists $results{$id};` [download] is unnecessary. I highly recommend that you read perlreftut: $ man perlreftut	[reply] [d/l] [select]
Re^3: how to compare two hashes with perl? by BioLion (Curate) on Nov 05, 2009 at 00:23 UTC
I take it this is bowtie output? It makes no sense to me why you are comparing all IDs in the first file to all IDs in the second? The whole point of using a hash is that you can look up specific keys, whereas an array would be for storing an ordered list. What are you actually trying to do? Get the common IDs between the files and say whether their associated sequences match? You can try something like this for that : `foreach my $id (keys %hash1){ # you can use (sort keys %hash1) if you +want them in a specified order if ( exists $hash2{$id} ){ print "\'$id\' exists in both hashes.\n"; if ( $hash1{$id} eq $hash2{$id} ){ ## id and sequence are stored as key value pairs print "and the sequences match too.\n"; } else{ print "but the sequences do not match.\n"; } } else { print "\'$id\' only exists in hash1.\n"; } }` [download] If you want help with data strucutes, try perldsc for starters. Just a something something...	[reply] [d/l]
Re^4: how to compare two hashes with perl? by FluffyBunny (Acolyte) on Nov 05, 2009 at 22:50 UTC
Hello BioLion, Basically I followed your code, use warnings; use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1"); # Open first file while (<FILE1>) { my ($ID1, undef, undef, undef, $Seq1) = split; $bow1{$ID1} = $ID1; $bow1{$Seq1} = $Seq1; print STDERR "$bow1{$ID1}\t$bow1{$Seq1}\n"; } close FILE1; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2"); # Open second file while (<FILE2>) { my ($ID2, undef, undef, undef, $Seq2) = split; $bow2{$ID2} = $ID2; $bow2{$Seq2} = $Seq2; print STDERR "$bow2{$ID2}\t$bow2{$Seq2}\n"; } close FILE2; foreach my $ID1 (keys %bow1){ # can use (sort keys %hash) to put items + in a specified order if ( exists $bow2{$ID2} ){ if ( $bow1{$ID1} eq $bow2{$ID2} ){ ## id and sequence are stored as key value pairs print "$bow1{$ID1} exists in $file1 and $file2 and the sequen +ces match $bow1{$Seq1} $bow2{$Seq2} \n"; } else{ print "$bow1{$ID1} exists in $file1 and $file2 but sequences D +O NOT match $bow1{$Seq1} $bow2{$Seq2} \n"; } } else { print "$bow1{$ID1} only exists in $file1 .\n"; } } exit; [download] However I get some errors Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 50. Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 51. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Execution of /home/choia2/scripts/BowtieCompare.pl aborted due to comp +ilation errors. [download] Basically the foreach loop.. I never used hash for other programming languages (I wasn't professional though) but this hash concept is confusing.. could you help me one more time? >.<	[reply] [d/l] [select]
UPDATE! I fixed it :D by FluffyBunny (Acolyte) on Nov 06, 2009 at 22:11 UTC
Re: UPDATE! I fixed it :D by BioLion (Curate) on Nov 09, 2009 at 12:08 UTC