in reply to Better Hash Tables?

This is certainly worth a look. I'm not an expert by any means when it comes to this stuff. But in general, it is certainly prudent to be very cautious when changing a crucial algorithm like hash tables. It may be faster in most cases, but is it also robust enough to withstand attacks? Is it guaranteed to always return the data that it should?

Even better performance than we already have would be nice, if this algo applies to Perl. But it would need very thorough testing in a wide range of settings and applications, with an easy way to switch between the current and the new algo within the same perl version.

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my weekly webcomics

Replies are listed 'Best First'.
Re^2: Better Hash Tables?
by sleet (Pilgrim) on Feb 19, 2025 at 05:56 UTC
    Both rust and golang have ports of google's Swiss Tables, which use less memory and are faster because they use AES and SSE instructions.
Re^2: Better Hash Tables?
by QM (Parson) on Feb 19, 2025 at 21:25 UTC
    Thanks for the links.

    There's a lot going unsaid in that article (not surprised, we're not the intended audience).

    From my further research, Perl resizes based on load factor, typically >75% (depends on the implementation). However, collisions are handled with a linked list, so in the worst case, a single bucket would hold all of the entries.

    The article and paper talk about "open-addressing". An open-addressing hash table is a data structure that uses probing to resolve collisions when storing keys in a fixed-length array. It's also known as closed hashing. (Nothing like confusing terminology.)

    Apparently, Python uses open-addressing with probing. The loading factor threshold for resizing is 2/3.

    Since Perl uses linked lists to "resolve" collisions, this paper doesn't appear to apply.

    There was this weird fact that popped up: Instead of just doubling, Python picks the next power of two to optimize performance. I mean, it doesn't start with a power-of-two size?

    There was nothing in the article that helped me understand why the paper makes anything better, except hand-wavy declarations that it's true.

    I also started to read the linked paper. Didn't get very far, didn't have the time to wade in and learn the terminology, and read the other papers.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of