Re: Search a hash for closest match

Since you are looking for a closest match and not an exact match, you necessarily need to iterate over the target set. To me, that means you probably want to use a hash of arrays, not a hash of hashes -- see perllol for info on differences between the structures.

I don't really follow your example, but as best as I can tell, your search algorithm should look something like

#!/usr/bin/perl -w
use strict;

my @values = (
    15,
    49,
    51,
    79,
);

my $pos = 50;
my $diff;
my @results;

for my $value (@values) {
    if (not defined $diff or abs($pos - $value) < $diff) {
        $diff = abs($pos - $value);
        @results = $value;
    } elsif (abs($pos - $value) == $diff) {
        push @results, $value;
    }
}

print join "\n", @results;
[download]

Note that there are more efficient search procedures, but this may be good enough before you profile.

Comment on Re: Search a hash for closest match Download Code

Replies are listed 'Best First'.
Re^2: Search a hash for closest match by aquinom (Sexton) on Nov 01, 2011 at 17:44 UTC
Thanks Kenneth, I had something similar in mind. I think I'm also going to pre-sort the HoA then try to do a binary search to speed up the lookup. Sound like a reasonable approach?	[reply]
Re^3: Search a hash for closest match by kennethk (Abbot) on Nov 01, 2011 at 18:00 UTC
Profile - like the great Knuth said, "Premature optimization is the root of all evil." Based upon the specs you give in Re^2: Search a hash for closest match, I would expect that linear search is not even remotely your optimal solution, and that using Perl's sort (Merge sort) followed by binary search is probably pretty derned good. However, better than fast code is functional code. I would highly recommend developing your script with a very simple bit of logic like this and small data sets, and improving the search algorithm later in the development cycle once it becomes necessary. YMMV.	[reply]
Re^4: Search a hash for closest match by aquinom (Sexton) on Nov 01, 2011 at 18:06 UTC
I think my response may have been a bit unclear/ambiguous: there are only 928K positions in total, so we can roughly estimate 40K elements per array (1M sites/25 chromosomes) and only 1 array will have to be searched based on the chromosome given.	[reply]
Re^5: Search a hash for closest match by kennethk (Abbot) on Nov 01, 2011 at 18:47 UTC