nanophd has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,
I can't seem to find a way to determine the key of an array which is closest to a value. My data is essentially a bell shaped curve that I will eventually want to do some math on. I am trying to create a simple "full width at half max" type analysis. I am able to easily locate the maximum value and associated key in the hash, but what I am getting stuck on is how to determine the keys (one on each side of max) at which has a value closest to $hash{$max/2}.

My hash has the following values (sorted to make it easier to look at):

-4 => 0 -3.9 => 0 -3.8 => 0 -3.7 => 1 -3.6 => 0 -3.5 => 1 -3.4 => 0 -3.3 => 0 -3.2 => 5 -3.1 => 19 -3 => 23 -2.9 => 50 -2.8 => 43 -2.7 => 28 -2.6 => 72 -2.5 => 165 -2.4 => 302 -2.3 => 168 -2.2 => 260 -2.1 => 151 -2 => 13 -1.9 => 1 -1.8 => 0 -1.7 => 0 -1.6 => 0 -1.5 => 0 -1.4 => 0 -1.3 => 0 -1.2 => 0 -1.1 => 0 -1 => 0
Step1: max value (302) has a key of "-2.4".
Step2: 302/2 = 151
Step3: Looking for keys -2.5 and -2.1

Any Suggestions?

I hope this makes sense
Thanks

Replies are listed 'Best First'.
Re: Finding hash key related to closest value
by LanX (Saint) on Mar 12, 2014 at 16:48 UTC
    IMHO you should rather use an array storing the values in 0.1 steps. (with a variable holding the start offset, here -4)

    Like this you can use a linear search forward and backward (or even binary if speed maters) to determine the range limits you wanna find.

    Keep in mind that hashes are unsorted but you need to compare neighbors.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Finding hash key related to closest value
by hdb (Monsignor) on Mar 12, 2014 at 17:54 UTC

    Assuming that performance is not your main concern and that your data really is bell-shaped (in particular increasing up to the max, and then decreasing), the following should do the trick:

    my $max = max values %hash; my $half = $max/2; my @above = grep { $hash{$_} >= $half } keys %hash; my $keylo = min @above; my $keyhi = max @above; print "1: $max\n"; print "2: $half\n"; print "3: $keylo, $keyhi\n";

    If you need the closest, you might want to check the neighbors of $keylo and $keyhi as well.

      That is absolutely brilliant, Thank you. It works great!
Re: Finding hash key related to closest value
by tangent (Parson) on Mar 12, 2014 at 18:10 UTC
    This might be overkill but here's a way to do it using a DB_File in memory BTREE database. It works by inverting the original hash and allows for duplicate values (now keys) by setting the R_DUP flag. See the docs for how you would deal with duplicates:
    use DB_File ; my %hash = ( '-2.6' => 72, '-2.5' => 165, '-2.4' => 302, '-2.3' => 168, '-2.2' => 260, '-2.1' => 151, # ...etc. ); $DB_BTREE->{'compare'} = sub { $_[0] <=> $_[1] }; $DB_BTREE->{'flags'} = R_DUP; my %h; my $db = tie %h, "DB_File", undef, undef, undef, $DB_BTREE; while (my($k,$v) = each %hash) { $h{$v} = $k; } my ($key,$value,$half); # get the max $db->seq($key, $value, R_LAST); print "Max: $key -> $value\n"; $key = $half = $key / 2; # get the nearest to half $db->seq($key, $value, R_CURSOR); print "$key -> $value\n"; $key <= $half ? $db->seq($key, $value, R_NEXT) : $db->seq($key, $value, R_PREV); print "$key -> $value\n";
    Output:
    Max: 302 -> -2.4 151 -> -2.1 165 -> -2.5
Re: Finding hash key related to closest value
by kcott (Archbishop) on Mar 13, 2014 at 05:41 UTC

    G'day nanophd,

    Here's my take on a solution:

    #!/usr/bin/env perl -l use strict; use warnings; use List::Util qw{max}; my %data = ( -4 => 0, -3.9 => 0, -3.8 => 0, -3.7 => 1, -3.6 => 0, -3.5 => 1, -3 +.4 => 0, -3.3 => 0, -3.2 => 5, -3.1 => 19, -3 => 23, -2.9 => 50, -2.8 => 43 +, -2.7 => 28, -2.6 => 72, -2.5 => 165, -2.4 => 302, -2.3 => 168, -2. +2 => 260, -2.1 => 151, -2 => 13, -1.9 => 1, -1.8 => 0, -1.7 => 0, -1.6 => 0, -1.5 => 0, -1.4 => 0, -1.3 => 0, -1.2 => 0, -1.1 => 0, -1 => 0, ); my $max_value = max values %data; my $max_key = { reverse %data }->{$max_value}; my $half = $max_value / 2; my @closest; my $side = 0; for my $key (sort { $a <=> $b } keys %data) { if ($key == $max_key) { $closest[$side] = [$max_key, $max_value] unless defined $close +st[$side]; $side = 1; next; } my $diff = abs $half - $data{$key}; if (! defined $closest[$side]) { $closest[$side] = [$key, $diff]; next; } $closest[$side] = [$key, $diff] if $closest[$side]->[1] > $diff; } print "max_key = $max_key; max_value = $max_value"; print "half = $half"; print "closest: BEFORE = $closest[0]->[0] AFTER = $closest[1]->[0]";

    Output:

    max_key = -2.4; max_value = 302 half = 151 closest: BEFORE = -2.5 AFTER = -2.1

    -- Ken

Re: Finding hash key related to closest value
by Anonymous Monk on Mar 14, 2014 at 02:50 UTC

    IMHO, all of these “solutions” are, in this particular case, much more expensive than ... a brute-force search through an array of 2-tuples consisting of [key_value, frequency].   Even if you are dealing with, say, a few hundred “buckets” in such an array, you are still coming out far ahead of all of these other calisthenics, which, as you see, are shamelessly building arrays of their own.   Sometimes, a simple loop that iterates through an entire, moderately sized, data structure, knowing that it will only have to make one complete pass through that structure (although it will be required to do all of this, every time ...), really does turn out to be the fastest way.

    A “hash” is a very special-purpose structure:   it’s good for searching for discrete, unordered keys, particularly those that have a string representation.   If that’s not true for your situation, you shouldn’t be coercing a hash into doing what it isn’t designed to do.   The decision instead should be made whether a more-complicated but purposeful data structure is actually worth the effort.   If so, grab one from CPAN.   Otherwise ... and this is strictly JM2CW ... I say, just foreach it.