in reply to grep keys in hash and retrieve values

Others have addressed your immediate problem, but there is plenty of other help to be provided. Consider the following:

#!/usr/bin/perl use strict; use warnings; my $tax2locus_file; my %tax2loc; open my $in, '<', $tax2locus_file or die "Can't open $tax2locus_file: +$!\n"; while (<$in>) { chomp; my ($taxid, $locus) = split /\t/; $tax2loc{$locus} = $taxid; } close ($in); print "there are\t" . keys (%tax2loc) . "\tlocus_ids as key in hash\n" +; ############### Now read in sharedTab file with pairwise overlap info my $sharedTab_file = $ARGV[0]; my $outfile = "$sharedTab_file.hostinfo"; open my $out, '>', $outfile or die "Can't create $outfile: $!\n"; open $in, '<', $sharedTab_file or die "Can't open $sharedTab_file: $!\ +n"; print $out "#prophageA\tprophageB\thostA\ttaxidA\thostB\ttaxidB\tjacc\ +n"; while (<$in>) { chomp; next if (/^#/); # ignore comments my @columns = split (/\t/, $_); my ($prophageA, $hostA, $taxidA) = getTaxId($columns[0]); my ($prophageB, $hostB, $taxidB) = getTaxId($columns[0]); print $out join ("\t", $prophageA, $prophageB, $hostA, $taxidA, $hostB, $taxidB, $col +umns[5]), "\n"; } sub getTaxId { my ($prophage, $lu) = @_; my ($host, $PFnum) = split /\./, $prophage; ## for wgs genomes just match first 7 characters as only NZ_XXXX00 +0000 are ## in tax2locus $host =~ s/^(NZ.{5}).*/$1/; my @matches = grep {$_ =~ /$host/} keys %$lu; die "Expected exactly one match for $host. Got " . scalar @matches + . "\n"; return $prophage, $host, $matches[0]; }

Note that the code is completely untested so may suffer from typos and egregious errors of all sorts, however points to note are:

Note that this code doesn't check to ensure the input data are correctly formatted as I'm not entirely sure what the format ought to be, but "production" code would ensure that sensible values were passed into getTaxId for $prophage for example.

True laziness is hard work

Replies are listed 'Best First'.
Re^2: grep keys in hash and retrieve values
by AWallBuilder (Beadle) on Mar 16, 2012 at 12:03 UTC

    thank you. this is great, I thought of using a subroutine but was getting it wrong. But your script isn't working. To me it looks as if you are only passing the $prophage to the subroutine, but you must also pass the hash? I tried editing it as follows. But I am recieving an error about passing a string to a hash reference.

     my ($prophageB, $hostB, $taxidB) = getTaxId($columns[0],%tax2loc);

      Err, I did say it was untested didn't I?

      There are two errors related to the hash. The second one is that the return statement needs to be changed to:

      return $prophage, $host, $lu->{$matches[0]};

      to return the value instead of the key.

      The first problem is as you noticed, the hash needs to be passed to the sub, but it needs to be passed by reference because of the way the code in the sub works:

      my ($prophageB, $hostB, $taxidB) = getTaxId($columns[0], \%tax2loc +);

      The pass by reference is an optimisation to save passing all the keys and values of the hash in a list which is what would happen otherwise. Note that the point of passing the hash at all is to avoid treating it as a global variable which is generally a "bad thing"™ (although this is such a small program it's not an issue except as a style thing).

      True laziness is hard work