Perfect hashing/Minimal perfect hashing seems like it might both reduce the space requirement and speed up the lookups.

However, generating such functions is non-trivial if you start from scratch...

This may not be as horrible as it sounds if your code is going to get a lot of use. Minimal Perfect Hashing looks like the only way you're going to get O(1) on lookups, and if you're going to be getting bigger and bigger data sets, the time up front starts to look more cost effective. A bit of digging around found a real algorithm that someone actually implemented in C and tested, designed for use with both integer keys and character keys. Unfortunately I've only found the original paper and a later one with pseudocode, nothing quite canned.

The paper with pseudocode is here: Czech, Havas, Majewski, and an earlier paper that's a little denser to interpret is here Havas, Majewski. This page: Schnell gives a pretty readable interpretation of it as well. It doesn't look too painful to implement.

I also found what looks like a similar algorithm that appears to include bi-directionality implemented in a python package call pyDAWG that you could either call or port.

A little more digging around on the relevant names might find you some c code that does about the same

EDIT: allegedly there's a Java package floating around called "ggperf" that's an implementation of the CHM algorithm (and is supposed to be faster than a "gperf" minimal perfect hash generator that's part of libg++) but I couldn't find source, just a paper that amounts to documentation for ggperf


In reply to Re^2: Bidirectional lookup algorithm? (Solution.) by bitingduck
in thread Bidirectional lookup algorithm? (Updated: further info.) by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.