fxia has asked for the wisdom of the Perl Monks concerning the following question:
There is a hash of items, say indexed by int keys (1, 2, 3, 6,8,...). The hash is tied dbm hash. Given an integer, I'd like to find that, either the integer is a key, or if it is not, what is the closest integer (but less than it) that is a key. For instance, 3 is a key in the hash, the closest key to 4 is 3. The closest key to 7 is 6. How can I solve the problem? Thanks a lot.
(tye)Re: Approximate Matching Key
by tye (Sage) on Apr 06, 2001 at 22:23 UTC
|
If you use a DBM that uses B-Tree keys instead of hashed
keys, then you can use the non-tied methods to do a
"less than or equal" search for a key in a manner that
will be very efficient.
But for that to work you need to normalize
your keys so that they sort properly as strings.
For example, pad them with sufficient zeros, store
them as pack("N",$key), or preface the
key with the number of digits (length($key).".$key"
so long as you don't have keys of more than 9 digits).
-
tye
(but my friends call me "Tye")
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Actually if you use Berkely DB either through DB_File or
BerkeleyDB you can choose the sort order for the
BTree, and using the R_CURSOR flag you can have the hash
lookup implement the approximate search exactly as
requested.
| [reply] [Watch: Dir/Any] |
|
Thanks for the help. But where can I find more documentation on using B-Tree keys?
Fred
| [reply] [Watch: Dir/Any] |
|
In this very thread, here.
Some more specifics: The DB_File approach looked easier (though I only looked briefly). You want to create a databse in DB_BTREE format with a comparison function that does a numeric compare rather than a string compare. You can then use the R_CURSOR flag the seq() method to get what you want.
-
tye
(but my friends call me "Tye")
| [reply] [Watch: Dir/Any] |
Re: Approximate Matching Key
by japhy (Canon) on Apr 06, 2001 at 22:25 UTC
|
Check to see if the key is in there, otherwise, subtract one at a time until it's found. Cache this, too.
my $nearest_key = nearest(\%hash, $key);
{
my %nearest;
sub nearest {
my ($href, $k) = @_;
my $low = $k;
return $nearest{$k} if exists $nearest{$k};
$low-- until exists $href->{$low};
@nearest{$low .. $k} = ($low) x ($k - $low + 1);
return $low;
}
}
japhy --
Perl and Regex Hacker | [reply] [Watch: Dir/Any] [d/l] |
(jeffa) Re: Approximate Matching Key
by jeffa (Bishop) on Apr 06, 2001 at 22:21 UTC
|
Here is one way - the hash I use is not a dbm hash, but it
is trivial to port it:
my %h = (
8 => 'value',
3 => 'value',
102 => 'value',
4 => 'value',
1 => 'value',
57 => 'value',
);
my @keys = sort { $a <=> $b } keys %h;
my $try = shift || 5; # this example allows user to specify on comm
+and line
my $low = 'not found';
foreach (@keys) {
$low = $_ if $_ < $try;
last if $_ >= $try;
}
print "$low\n";
Jeff
R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--
| [reply] [Watch: Dir/Any] [d/l] |
|
foreach (@keys) {
$low = $_ if $_ <= $try;
last if $_ >= $try;
}
The = was missing in the $low line, so it would miss exact matches. It should still be present on the last line as well, despite my first instinct :-), because we can guarantee to have found a match. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Approximate Matching Key
by Masem (Monsignor) on Apr 06, 2001 at 22:45 UTC
|
Given the hash, %hash, and target value $value...
pop grep { $_ <= $value } ( sort { $a <=> $b } keys %hash );
Quick comment/update...
The best code to use is going to depend on both the size and the sparseness of your hash. Define M as the maximum hash value that you plan to have, n as the number of hash entries. The "start at X, check hash backwards" method will take O(M/n) assuming a good distribution, while my method above will take O(n)+O(n*log n) ~ O(n*log n). Thus, if you have a dense hash, the count-backwards will be faster, while grep/sort will be faster for a sparse hash. You'll probably want to benchmark both to see which method is best suited for what you need if you are going to be using this call frequently.
Dr. Michael K. Neylon - mneylon-pm@masemware.com
||
"You've left the lens cap of your mind on again, Pinky" - The Brain
| [reply] [Watch: Dir/Any] [d/l] |
Re: Approximate Matching Key
by kilinrax (Deacon) on Apr 06, 2001 at 22:21 UTC
|
#!/usr/bin/perl -w
use strict;
my %data = (
1 => 'foo',
3 => 'bar',
7 => 'baz',
12 => 'fumble'
);
print &find_nearest (10, \%data);
sub find_nearest($\%) {
my $target = shift;
my %data = %{shift()};
while (!defined $data{$target}) {
$target--;
}
return $target;
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: Approximate Matching Key
by fxia (Novice) on Apr 10, 2001 at 01:37 UTC
|
Thanks all so much for the answers. I'm going to prototype and implement.
Fred
| [reply] [Watch: Dir/Any] |
|
|