Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Perfect hashing/Minimal perfect hashing seems like it might both reduce the space requirement and speed up the lookups.

However, generating such functions is non-trivial if you start from scratch...

This may not be as horrible as it sounds if your code is going to get a lot of use. Minimal Perfect Hashing looks like the only way you're going to get O(1) on lookups, and if you're going to be getting bigger and bigger data sets, the time up front starts to look more cost effective. A bit of digging around found a real algorithm that someone actually implemented in C and tested, designed for use with both integer keys and character keys. Unfortunately I've only found the original paper and a later one with pseudocode, nothing quite canned.

The paper with pseudocode is here: Czech, Havas, Majewski, and an earlier paper that's a little denser to interpret is here Havas, Majewski. This page: Schnell gives a pretty readable interpretation of it as well. It doesn't look too painful to implement.

I also found what looks like a similar algorithm that appears to include bi-directionality implemented in a python package call pyDAWG that you could either call or port.

A little more digging around on the relevant names might find you some c code that does about the same

EDIT: allegedly there's a Java package floating around called "ggperf" that's an implementation of the CHM algorithm (and is supposed to be faster than a "gperf" minimal perfect hash generator that's part of libg++) but I couldn't find source, just a paper that amounts to documentation for ggperf


In reply to Re^2: Bidirectional lookup algorithm? (Solution.) by bitingduck
in thread Bidirectional lookup algorithm? (Updated: further info.) by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-25 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found