Finding hash key related to closest value

nanophd has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding hash key related to closest value by LanX (Saint) on Mar 12, 2014 at 16:48 UTC
IMHO you should rather use an array storing the values in 0.1 steps. (with a variable holding the start offset, here -4) Like this you can use a linear search forward and backward (or even binary if speed maters) to determine the range limits you wanna find. Keep in mind that hashes are unsorted but you need to compare neighbors. Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re: Finding hash key related to closest value by hdb (Monsignor) on Mar 12, 2014 at 17:54 UTC
Assuming that performance is not your main concern and that your data really is bell-shaped (in particular increasing up to the max, and then decreasing), the following should do the trick: `my $max = max values %hash; my $half = $max/2; my @above = grep { $hash{$_} >= $half } keys %hash; my $keylo = min @above; my $keyhi = max @above; print "1: $max\n"; print "2: $half\n"; print "3: $keylo, $keyhi\n";` [download] If you need the closest, you might want to check the neighbors of `$keylo` and `$keyhi` as well.	[reply] [d/l] [select]
Re^2: Finding hash key related to closest value by nanophd (Novice) on Mar 12, 2014 at 18:02 UTC
That is absolutely brilliant, Thank you. It works great!	[reply]
Re: Finding hash key related to closest value by tangent (Parson) on Mar 12, 2014 at 18:10 UTC
This might be overkill but here's a way to do it using a DB_File in memory BTREE database. It works by inverting the original hash and allows for duplicate values (now keys) by setting the R_DUP flag. See the docs for how you would deal with duplicates: use DB_File ; my %hash = ( '-2.6' => 72, '-2.5' => 165, '-2.4' => 302, '-2.3' => 168, '-2.2' => 260, '-2.1' => 151, # ...etc. ); $DB_BTREE->{'compare'} = sub { $_[0] <=> $_[1] }; $DB_BTREE->{'flags'} = R_DUP; my %h; my $db = tie %h, "DB_File", undef, undef, undef, $DB_BTREE; while (my($k,$v) = each %hash) { $h{$v} = $k; } my ($key,$value,$half); # get the max $db->seq($key, $value, R_LAST); print "Max: $key -> $value\n"; $key = $half = $key / 2; # get the nearest to half $db->seq($key, $value, R_CURSOR); print "$key -> $value\n"; $key <= $half ? $db->seq($key, $value, R_NEXT) : $db->seq($key, $value, R_PREV); print "$key -> $value\n"; [download] Output: `Max: 302 -> -2.4 151 -> -2.1 165 -> -2.5` [download]	[reply] [d/l] [select]
Re: Finding hash key related to closest value by kcott (Archbishop) on Mar 13, 2014 at 05:41 UTC
G'day nanophd, Here's my take on a solution: #!/usr/bin/env perl -l use strict; use warnings; use List::Util qw{max}; my %data = ( -4 => 0, -3.9 => 0, -3.8 => 0, -3.7 => 1, -3.6 => 0, -3.5 => 1, -3 +.4 => 0, -3.3 => 0, -3.2 => 5, -3.1 => 19, -3 => 23, -2.9 => 50, -2.8 => 43 +, -2.7 => 28, -2.6 => 72, -2.5 => 165, -2.4 => 302, -2.3 => 168, -2. +2 => 260, -2.1 => 151, -2 => 13, -1.9 => 1, -1.8 => 0, -1.7 => 0, -1.6 => 0, -1.5 => 0, -1.4 => 0, -1.3 => 0, -1.2 => 0, -1.1 => 0, -1 => 0, ); my $max_value = max values %data; my $max_key = { reverse %data }->{$max_value}; my $half = $max_value / 2; my @closest; my $side = 0; for my $key (sort { $a <=> $b } keys %data) { if ($key == $max_key) { $closest[$side] = [$max_key, $max_value] unless defined $close +st[$side]; $side = 1; next; } my $diff = abs $half - $data{$key}; if (! defined $closest[$side]) { $closest[$side] = [$key, $diff]; next; } $closest[$side] = [$key, $diff] if $closest[$side]->[1] > $diff; } print "max_key = $max_key; max_value = $max_value"; print "half = $half"; print "closest: BEFORE = $closest[0]->[0] AFTER = $closest[1]->[0]"; [download] Output: `max_key = -2.4; max_value = 302 half = 151 closest: BEFORE = -2.5 AFTER = -2.1` [download] -- Ken	[reply] [d/l] [select]
Re: Finding hash key related to closest value by Anonymous Monk on Mar 14, 2014 at 02:50 UTC
IMHO, all of these “solutions” are, in this particular case, much more expensive than ... a brute-force search through an array of 2-tuples consisting of `[key_value, frequency]`. Even if you are dealing with, say, a few hundred “buckets” in such an array, you are still coming out far ahead of all of these other calisthenics, which, as you see, are shamelessly building arrays of their own. Sometimes, a simple loop that iterates through an entire, moderately sized, data structure, knowing that it will only have to make one complete pass through that structure (although it will be required to do all of this, every time ...), really does turn out to be the fastest way. A “hash” is a very special-purpose structure: it’s good for searching for discrete, unordered keys, particularly those that have a string representation. If that’s not true for your situation, you shouldn’t be coercing a hash into doing what it isn’t designed to do. The decision instead should be made whether a more-complicated but purposeful data structure is actually worth the effort. If so, grab one from CPAN. Otherwise ... and this is strictly JM2CW ... I say, just `foreach` it.	[reply]