in reply to Bidirectional lookup algorithm? (Updated: further info.)
You could lump longer symbols together a la join qq(\0), grep length() > 8, @syms; access via ptr or offset. Keep short ones directly in the 8-byte slots of the symbol table. (union). This should get you down to ~10 bytes per key.
To index the tables, use minimal perfect hashing. There's the CMPH library, which appears quite simple to use. Though you'll need to work on the glue — cmph bindings via Perfect::Hash are incomplete, and it's not on CPAN.
Anyway, with perfect hashing you need but a few extra bits to do the indexing. Various methods are available. CHM needs ~8 bytes per key, but preserves order. BRZ is about ~8 bits per key, uses external memory while building the tables so you're not mem-constrained. BDZ is just under 3 bits per key. Storing the 8-byte keys as you are already, these few extra bits aren't much.
Performance estimates: about 3 sec per million keys when constructing the mph structures; about .3 sec per million lookups. Compared to .15 usec lookups with direct hashes.
So expect roughly 3 times slower performance with about 10 times the memory density. If the tradeoff is worth the effort, I'll leave for you to decide.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Bidirectional lookup algorithm? (try perfect hashing)
by oiskuu (Hermit) on Jan 11, 2015 at 02:52 UTC | |
by BrowserUk (Patriarch) on Jan 11, 2015 at 09:46 UTC | |
|
Re^2: Bidirectional lookup algorithm? (try perfect hashing)
by oiskuu (Hermit) on Jan 13, 2015 at 02:21 UTC |