Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Approximate Matching Key

by fxia (Novice)
on Apr 06, 2001 at 22:14 UTC ( [id://70548]=perlquestion: print w/replies, xml ) Need Help??

fxia has asked for the wisdom of the Perl Monks concerning the following question:

There is a hash of items, say indexed by int keys (1, 2, 3, 6,8,...). The hash is tied dbm hash. Given an integer, I'd like to find that, either the integer is a key, or if it is not, what is the closest integer (but less than it) that is a key. For instance, 3 is a key in the hash, the closest key to 4 is 3. The closest key to 7 is 6. How can I solve the problem? Thanks a lot.

Replies are listed 'Best First'.
(tye)Re: Approximate Matching Key
by tye (Sage) on Apr 06, 2001 at 22:23 UTC

    If you use a DBM that uses B-Tree keys instead of hashed keys, then you can use the non-tied methods to do a "less than or equal" search for a key in a manner that will be very efficient.

    But for that to work you need to normalize your keys so that they sort properly as strings. For example, pad them with sufficient zeros, store them as pack("N",$key), or preface the key with the number of digits (length($key).".$key" so long as you don't have keys of more than 9 digits).

            - tye (but my friends call me "Tye")
      Actually if you use Berkely DB either through DB_File or BerkeleyDB you can choose the sort order for the BTree, and using the R_CURSOR flag you can have the hash lookup implement the approximate search exactly as requested.
      Thanks for the help. But where can I find more documentation on using B-Tree keys? Fred

        In this very thread, here.

        Some more specifics: The DB_File approach looked easier (though I only looked briefly). You want to create a databse in DB_BTREE format with a comparison function that does a numeric compare rather than a string compare. You can then use the R_CURSOR flag the seq() method to get what you want.

                - tye (but my friends call me "Tye")
Re: Approximate Matching Key
by japhy (Canon) on Apr 06, 2001 at 22:25 UTC
    Check to see if the key is in there, otherwise, subtract one at a time until it's found. Cache this, too.
    my $nearest_key = nearest(\%hash, $key); { my %nearest; sub nearest { my ($href, $k) = @_; my $low = $k; return $nearest{$k} if exists $nearest{$k}; $low-- until exists $href->{$low}; @nearest{$low .. $k} = ($low) x ($k - $low + 1); return $low; } }


    japhy -- Perl and Regex Hacker
(jeffa) Re: Approximate Matching Key
by jeffa (Bishop) on Apr 06, 2001 at 22:21 UTC
    Here is one way - the hash I use is not a dbm hash, but it is trivial to port it:
    my %h = ( 8 => 'value', 3 => 'value', 102 => 'value', 4 => 'value', 1 => 'value', 57 => 'value', ); my @keys = sort { $a <=> $b } keys %h; my $try = shift || 5; # this example allows user to specify on comm +and line my $low = 'not found'; foreach (@keys) { $low = $_ if $_ < $try; last if $_ >= $try; } print "$low\n";

    Jeff

    R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
    L-L--L-L--L-L--L-L--L-L--L-L--L-L--
    

      Minor typo...

      foreach (@keys) { $low = $_ if $_ <= $try; last if $_ >= $try; }

      The = was missing in the $low line, so it would miss exact matches. It should still be present on the last line as well, despite my first instinct :-), because we can guarantee to have found a match.

Re: Approximate Matching Key
by Masem (Monsignor) on Apr 06, 2001 at 22:45 UTC
    Given the hash, %hash, and target value $value...
    pop grep { $_ <= $value } ( sort { $a <=> $b } keys %hash );
    Quick comment/update...

    The best code to use is going to depend on both the size and the sparseness of your hash. Define M as the maximum hash value that you plan to have, n as the number of hash entries. The "start at X, check hash backwards" method will take O(M/n) assuming a good distribution, while my method above will take O(n)+O(n*log n) ~ O(n*log n). Thus, if you have a dense hash, the count-backwards will be faster, while grep/sort will be faster for a sparse hash. You'll probably want to benchmark both to see which method is best suited for what you need if you are going to be using this call frequently.


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
Re: Approximate Matching Key
by kilinrax (Deacon) on Apr 06, 2001 at 22:21 UTC
    How about this?
    #!/usr/bin/perl -w use strict; my %data = ( 1 => 'foo', 3 => 'bar', 7 => 'baz', 12 => 'fumble' ); print &find_nearest (10, \%data); sub find_nearest($\%) { my $target = shift; my %data = %{shift()}; while (!defined $data{$target}) { $target--; } return $target; }
Re: Approximate Matching Key
by fxia (Novice) on Apr 10, 2001 at 01:37 UTC
    Thanks all so much for the answers. I'm going to prototype and implement. Fred

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://70548]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 09:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found