Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Bidirectional lookup algorithm? (try perfect hashing)

by oiskuu (Hermit)
on Jan 13, 2015 at 02:21 UTC ( [id://1113021]=note: print w/replies, xml ) Need Help??


in reply to Re: Bidirectional lookup algorithm? (try perfect hashing)
in thread Bidirectional lookup algorithm? (Updated: further info.)

Ok. Here's a crude demo in case someone is interested in doing benchmarking comparisons or whatever. The program is pure C, tested on linux 64-bit. Give two arguments: Nkeys and Filename. File should contain sym-value pairs, both unique in their domains. (Space-separated, one pair per line).

Update. Some additional notes regarding the program.

Portability is easily improved upon:
  • Replace mmap/mremap with malloc/realloc. This is trivial.
  • The test for direct vs indirect entries (ss offset) might be written as !(s->o >> 56), avoiding issues with endianness.
There are at least two bugs to fix:
  • Verify sym to val loop has no test for NULL; this will fault on a missing key.
  • The fetch_by_key should not proceed with a memcmp in case keylen > 8, but the slot is direct (ie <= 8).
Many further optimizations to consider:
  • Try to allocate ss strings so that cache-line boundaries aren't crossed.
  • Terminate strings with a high-bit: more compact, plus faster than using memchr.
  • Specific inlined implementations for 8-byte memcmp, etc.
  • Inplace reorder looks up every key twice. Place an (un)processed flag to halve this.
  • Choose a faster hash function for string keys, FNV perhaps.
  • A primitive hash function for values. Just a mul by constant?
There's also a wishlist for the CMPH library, but enough for now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1113021]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-24 05:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found