in reply to Re^3: Perl Hashes in C?
in thread Perl Hashes in C?
This is the trivial hash, guaranteed to never have a collision with sequential integers. And it has no data++$h{$_} for 1 .. 27e6; print time() - $t;;
I wrote a test script to see how fast the suggested Perl hashing is with Big Data and millions of collisions:
No snide comments about my backwoodsy writing style, please :)sub rgb_hash() { $debug=0; $raw="pf-2015.0531-249465.pprgb.7360x4912.raw"; %e2at=%rgb2c=(); # Init Event -> Accululated Time hash; RGB -> Co +unt hash &time_event('init', \%e2at, $debug*0); # Initialize timing struct +ures &time_event('Read_Raw', \%e2at, $debug*0); $rsize = -s $raw; open(IN, "<$raw"); binmode IN; $read = read(IN, $buf, $rsize); close IN; # Use it n cluse it! die("Read ERROR!: Read $read != file $rsize\n") unless($read==$rs +ize); printf("Read %1.6f MB in raw file $raw\n", $read/1.0E6); &time_event('# Pre-extend array', \%e2at, $debug*0); $ushort[108456960+1] = 0x0000FFFF0000; # Pre-extend array &time_event('Unpack', \%e2at, $debug*0); @ushort=unpack("S*", $buf); $su = scalar(@ushort) || -1; printf("Ushort[$su], length(buf)=%1.6f, Bytes/ushort %1.9f\n", ($bl=length($buf)) / 1.0E6, $bl/$su); for($ii=0; $ii < scalar @ushort; $ii+=3) { # Extract the R, G and B values ($rr, $gg, $bb) = @ushort[$ii .. $ii+2]; # Array slice $quad=pack('Q', $rr, $gg, $bb, 0x0000); $mult=$rr*$gg*$bb; # Titanic 281 Trillion size number $white=$rr + $gg*65535 + $bb*65535*65535; # 281 Trillion size + number $bs=$rr | ($gg << 16) | ($bb << 32); # << WORKS! $rgb2c{$bs}++; # Increment count for this color if(!($ii%6E6)) { # Document time on every 100k unit $jj++; printf("R=0x%04hx, g=0x%04hx, b=0x%04hx, bs=0x%016qx, q=0x +%016qx, m=0x%016qx, W=0x%016qx\n", $rr, $gg, $bb, $bs, $quad, $mult, $white); } } &time_event("Count distinct colors", \%e2at, $debug*0); printf("RGB hash has %d distinct colors.\n", scalar keys %rgb2c); &time_event('term', \%e2at, $debug*0); # Finalize timing structur +es &show_event(\%e2at, $debug*0); # Automatically stops last event t +iming }
This thing is 3 times as fast as the C program I wrote.
And we get exactly the same number.
I has to be the hashing function calculating an address. The C program, bless its little heart, had to do a Binary Search over an ever growing lookup table.
There was one severe problem with getting the 48bit hash key right. I packed 3 verified unsigned short into a "Q"
The R, G and B printed perfectly and agreed with the Bitshift/AND value and the QUAD was always zero:($rr, $gg, $bb) = @ushort[$ii .. $ii+2]; # Array slice $quad=pack('Q', $rr, $gg, $bb, 0x0000);
It's kind of hard to see, buy they all agree except for theR=0x0892, g=0x0b8c, b=0x0672, bs=0x000006720b8c0892, q=0x0000000000000000, R=0x0892 g=0x0b8c b=0x0672,
I followed written documentation:#(*#@%) QUAD!
From: >> http://perldoc.perl.org/functions/pack.html
I had to do a bitshift, <<16 for the GREEN and <<32 for the BLUE and logically AND them together to get a UINT24_T that worked.Formula: pack TEMPLATE,LIST; Q An unsigned quad value. Example: $foo = pack('nN', 42, 4711); # pack big-endian 16- and 32-bit unsigned integers
The performance is MOST IMPRESSIVE!!
Thanks for the Pointers (References?
================================================ Boring Report: Running c:/bin/bb.pl Wed Aug 12 20:05:10 2015 Read 216.913920 MB in raw file pf-2015.0531-249465.pprgb.7360x4912.raw Ushort[108456960], length(buf)=216.913920, Bytes/ushort 2.000000000 R=0x1746, g=0x1910, b=0x35ee, bs=0x000035ee19101746 BAD quad=0x0000000000000000, m=0x0000007ae0ad0540, W=0x000035edad3434 +24 ... RGB hash has 27645898 distinct colors. <<<<<< The good part SE: Sum of Event Times = 1.11464685 min, now-init = 1.11464715 min Event '# Pre-extend array' elapsed time = 0.003 min = 0.24%. Event 'Count distinct colors' elapsed time = -0.017 min = -1.50%. Event 'Loop' elapsed time = 1.012 min = 90.81%. Event 'Read_Raw' elapsed time = 0.002 min = 0.22%. Event 'Unpack' elapsed time = 0.097 min = 8.73%. 90.81% -> Loop 8.73% -> Unpack 0.24% -> # Pre-extend array 0.22% -> Read_Raw 0.00% -> Count distinct colors Elapsed time = 1.11 min
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Perl Hashes in C?
by BrowserUk (Patriarch) on Aug 13, 2015 at 05:08 UTC | |
|
Re^5: Perl Hashes in C?
by BrowserUk (Patriarch) on Aug 13, 2015 at 01:40 UTC |