dr_joe has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

This script I wrote to make two hashes and then search one hash with the keys from the other is only returning the correct result for the last key in the hash being searched. Additionally, it is taking a long time to run. I have reached the limits of my ability in trying to troubleshoot this. Any wisdom received from you will be much appreciated.

#!/usr/bin/perl -w #GIDgiver.pl #this script reads gene symbols and assigns the matching Entrez GeneID + to each gene use strict; open FH1, 'GID_GeneSymbol.txt', or die "Can't open GID file\n"; open FH2, 'CVD_genelist.txt', or die "Can't open CVD gene list\n"; our (@list1, @list2, @temp); my %GID = (); my %CVDgenes = (); while (<FH1>){ chomp; push (@list1, $_); foreach (@list1){ @temp=split(/\s+/, $_); } %GID = map { $temp[0] => $temp[1] } @temp; } while (<FH2>){ chomp; push (@list2, $_); } %CVDgenes = map { $_ => 1 } @list2; while ((my $k, my $v) = each(%GID)) { if (exists ($CVDgenes{$v})){ print "The GID for gene $v is $k\n"; }}

Replies are listed 'Best First'.
Re: hash lookup
by moritz (Cardinal) on Oct 12, 2009 at 21:33 UTC
    %GID = map { $temp[0] => $temp[1] } @temp;

    This assigns two elements to %GID, throwing away all other items in that hash that might been in there before. Not what you want. Instead, just write $GID{$temp[0]} = $temp[1].

    There's also no reason to build the @list1 and @list2 arrays. Finally you can get rid of %GID completely by first iterating over <FH2> and building %CVDgenes, and then iterate over <FH1> directly.

    Perl 6 - links to (nearly) everything that is Perl 6.
      Thanks, Moritz.

      At your leisure, would you kindly give me a bit more detail about how to approach the last part of your suggestions (getting rid of %GID completely)?

        There's really not much detail to be given. Currently you iterate over one file, store all elements in a hash, and later on you iterate over all the data in the hash.

        Unless you need the hash for weeding out duplicates, you simply iterate over the data that you would have put into th hash when you need it.

        Perl 6 - links to (nearly) everything that is Perl 6.
Re: hash lookup
by almut (Canon) on Oct 12, 2009 at 22:17 UTC

    To elaborate on what moritz hinted at with respect to @list1:  for every iteration (i.e. line in the file), the list gets longer, but you're only ever making effective use of the last entry anyway in what you do with the list:

    foreach (@list1){ @temp=split(/\s+/, $_); }

    would leave in @temp only the result of splitting the last entry of @list1.  This creates unnecessary work which is only making your program slow...

    Here's a simplified version (if I understood correctly what you're trying to do):

    # create lookup table while (<FH2>) { chomp; $CVDgenes{$_} = 1; } # go through the other file and check if its entries # are found in the lookup table: while (<FH1>){ chomp; my ($gene, $gid) = split(/\s+/, $_); if (exists $CVDgenes{$gene}) { print "The GID for gene $gene is $gid\n"; } }
      OK, now I get it, and the code does exactly what I need it to. The insight from both of you is much appreciated.
        That's me above, I forgot to log in.