Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI Monks, I would like to check whether the keys of two hashes are same and work on them. The hashes should be in the same order as the input file. Since, we cant compare the hashes directly and have to loop over the key and value pair the larger files are taking very long to be finished.
I have attached my piece of code Could you please give suggestions to imporve it please
Thanks
#!/usr/local/bin/perl -w use strict; use Getopt::Long; use Tie::IxHash; my ($bpfile , $intfile, $out); my $usage = " USAGE: perl beadpool_int.pl -bpfile <file with snp,chr, +beadpoolid> -int <intensities file from b2i>\n"; GetOptions("bp=s" =>\$bpfile, "int=s" =>\$intfile, "out=s" =>\$out) || warn "Error $!\n"; if(!$bpfile) {die "\n$usage\n"}; my %beads; tie my %ints,"Tie::IxHash"; open BP, "<$bpfile" or die $!; open INT, "<$intfile" or die $!; open OUT, ">$out"or die $!; while (my $line= <BP>) { chomp $line; my @bp = split(/,/,$line); $beads{$bp[0]} = $bp[2]; } close BP; #my ($snp, @array) my @intensities; while (my $line = <INT>){ chomp $line; my $LABELS_REGEX =qr(^SNP\s+Coor); if ($line !~ $LABELS_REGEX) { my ($snp, $coor, $allele,@array) = split(/\t/,$line); $ints{$snp} = \@array; @intensities = @array; } } close INT; foreach my $snp(keys %ints) { foreach my $snpbs(keys %beads){ if($snp eq $snpbs){ my @ints; for(my $i = 0;$i<scalar(@intensities);$i++) { push @ints, $ints{$snp}[$i] ; } print OUT "$beads{$snpbs}\t$snp\t", join("\t",@ints), "\n"; } #print OUT "\n"; } }

Thanks for your time and suggestions

Replies are listed 'Best First'.
Re: To make the comparison work faster
by Limbic~Region (Chancellor) on Dec 08, 2009 at 15:31 UTC
    Anonymous Monk,
    This is how your code probably should be written based on what little I understand.
    #!/usr/bin/perl use strict; use warnings; my $in_file = $ARGV[0] or die "Usage: <file1> <file2>"; my $in2_file = $ARGV[1] or die "Usage: <file2> <file2>"; open(my $fh, '<', $in_file) or die "Unable to open '$in_file' for read +ing: $!"; my %bead; while (<$in_file>) { chomp; my ($snp, $chr, $beadpoolid) = split /,/; $bead{$snp} = $beadpoolid; } open($fh, '<', $in2_file) or die "Unable to open '$in2_file' for readi +ng: $!"; while (<$fh>) { next if ! /^SNP\s+Coor/; chomp; my ($snp, $coor, $allele, @int) = split /\t/; next if ! $bead{$snp}; print join("\t", $bead{$snp}, $snp, @int), "\n"; }

    Cheers - L~R

Re: To make the comparison work faster
by DStaal (Chaplain) on Dec 08, 2009 at 14:21 UTC

    You only care if the keys are the same, correct? So why check every key of one hash against every key of the other? Instead, get the list of keys from one, and check to see if they are all in the other. Then check the values for each hash for the same keys.

    Example: (Untested)

    my @found_keys = grep { exists($hash_b{$_}) } keys %hash_a; my @matched_keys = grep { $hash_a{$_} eq $hash_b{$_} } @found_keys; if ( @found_keys == keys %hash_a ) { # All keys in %hash_a were found. } if ( @found_keys == keys %hash_b ) { # All keys in %hash_b were found. } if ( @found_keys == @matched_keys ) { # All keys matched values. }

    Update: It got pointed out to me that if found_keys and @matched_keys are both empty, the last case would return true. Which is technically correct - all found keys matched values, since none were found - but may not be what you want. Checking the length of one of them as well might help in that case.