Re^3: 32bit/64bit hash function: Use perls internal hash function?

by vr (Curate)
on Apr 11, 2022 at 12:20 UTC

in reply to Re^2: 32bit/64bit hash function: Use perls internal hash function?
in thread 32bit/64bit hash function: Use perls internal hash function?

use strict; use warnings; use feature 'say'; use B 'hash'; use Crypt::xxHash 'xxhash3_64bits'; use Digest::xxH64 'xx64'; use Benchmark 'cmpthese'; use Inline C => <<'_C_'; U32 myhash(SV* sv) { STRLEN len; U32 hash = 0; const char *s = SvPVbyte(sv, len); PERL_HASH(hash, s, len); return hash; } _C_ srand 1234; my $s = pack 'C*', map rand 256, 1 .. 64; cmpthese -2, { hash => sub { my $x = hash( $s )}, myhash => sub { my $x = myhash( $s )}, xxhash => sub { my $x = xxhash3_64bits( $s, 0 )}, xx64 => sub { my $x = xx64( $s )}, }; __END__ Rate hash myhash xxhash xx64 hash 1944302/s -- -52% -54% -84% myhash 4088577/s 110% -- -3% -66% xxhash 4233986/s 118% 4% -- -65% xx64 11994386/s 517% 193% 183% -- This is perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x +64-multi-thread

Try xxHash? The Digest::xxH64 is not on CPAN (but linked to from home i.e. officially 'endorsed'(?):)), Crypt::xxHash needs a fix to install in Windows, and Digest::xxHash (not in example above) is slower and therefore perhaps not of much interest in context of 'B::hash is too slow'.

As already mentioned, the Judy::HS provides both hashing and sparse storage already built-in under-the-hood. So maybe manually-done hashing is not what you need. I have 'played' (i.e. not in serious 'production') with Judy (but not with Judy::HS) to store and access huge sparse data, and, yes, speed is comparable to Perl hashes with significantly less RAM appetites.

Another option to consider: Math::GSL::SparseMatrix (and GSL being solid and renowned, etc.). As above, I 'played' with 64-bit-addressed sparse single-row (or was it single-column?) vector. Slower than Judy, yet installs without hassle in Windows, theoretically can address 128-bit sparse space (because of 2D) and can store data shorter than 64-bit integers i.e. needs even less RAM in that case.

