Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear friends,
somewhere in my code I construct a HoA, based on the following snippet:
my %HoA_total_hits=(); while(<>) { my @split_hit_total = split(/\s+/, $hit_total); my $pfam_ac_hit = $split_hit_total[1]; my $protein_ac = $split_hit_total[3]; my $iEvalue = $split_hit_total[12]; my $seq_start = $split_hit_total[19]; my $seq_end = $split_hit_total[20]; my $range_b=$seq_start.'-'.$seq_end; push @{ $HoA_total_hits{$protein_ac} }, $range_b; }

My question is, assuming I have more than one hits for the same $protein_ac, how would I go in order to add in the HoA only one range, the one that has the lowest $iEvalue? Somehow I must compare the $iEvalue of all hits for the same $protein_ac (if more than one) and keep only one of them.
Thank you!
  • Comment on How can I add an entry in a has of arrays based on numeric comparison?
  • Download Code

Replies are listed 'Best First'.
Re: How can I add an entry in a has of arrays based on numeric comparison?
by AppleFritter (Vicar) on Jun 26, 2014 at 09:28 UTC

    First of all, you have a mistake in your code. The first line in your loop should be:

    my @split_hit_total = split(/\s+/, $_);

    since $_ is your loop variable here, not $hit_total. (use strict is good for catching things like that, BTW!)

    That said, to answer your question: repurpose your HoA so that it stores (for each $protein_ac) the lowest encountered iEvalue along with its corresponding range, rather than all ranges. E.g. instead of pushing the range, do this:

    if(!defined($HoA_total_hits{$protein_ac}) || $HoA_total_hits{$protei +n_ac}->{'iEvalue'} > $iEvalue) { $HoA_total_hits{$protein_ac}->{'iEvalue'} = $iEvalue; $HoA_total_hits{$protein_ac}->{'range_b'} = $range_b; }

    Untested in the absence of example data... and I've not had my morning brew yet, so apologies if this is obvious rubbish.

    (Side note -- I turned your HoA into a HoH here, but didn't change the name. I'd advise against putting the type of a variable in its name anyway.)

    EDIT: using the sample data from Re^2: How can I add an entry in a has of arrays based on numeric comparison?, here's what Data::Dumper has to say about the contents of $HoA_total_hits after running this:

    $VAR1 = { 'I5EU07' => { 'range_b' => '232-824', 'iEvalue' => '3.4e-137' } };

    Does that look right?

      Aha, thanks!
      So, I must change the structure to HoH instead of HoA to achieve the desired result?
      (All these advanced structures seem so complicated to me!!!!)

        You don't have to; you can continue using a HoA as well, replacing (say) ->{'iEvalue'} with ->[0] and ->{'range_b'} with ->1. If you did this, the result for your sample data would be:

        $VAR1 = { 'I5EU07' => [ '3.4e-137', '232-824' ] };

        It's pretty much the same thing, but I personally prefer using hashes over arrays unless my "keys" are naturally numeric; you have to type a bit more, but your code'll be self-documenting. It's always obvious what ->{'iEvalue'} means, even years later when you're reading someone else's code, but the same is not true for ->[0].

Re: How can I add an entry in a has of arrays based on numeric comparison?
by tobyink (Canon) on Jun 26, 2014 at 08:27 UTC

    Can you provide some sample data, and a corresponding example of your desired output?

      Hi,
      so, some sample lines could be:
      STN PF07660 597 I5EU07 - 125 +9 1e-140 467.1 11.7 1 2 5.9e-140 3.4e-137 455.5 7.3 + 9 595 238 822 232 824 0.97 - STN PF07660 597 I5EU07 - 125 +9 1e-140 467.1 11.7 2 2 3.9e-05 0.023 10.2 0.0 1 +48 227 864 949 842 962 0.80 -

      and, in that case, we would need to store the first line, because the $iEvalue for the first line is 3.4e-137, which is lower (thus better) than the second line's $iEvalue which is 0.023.
      So, the only thing I need is somehow, if I have the same $protein_ac (in this example I5EU07) and more than one lines of the same $pfam_ac_hit (in this example PF07660), I keep only the one with the lowest $iEvalue.