++$h{$_} for 1 .. 27e6; print time() - $t;;
This is the trivial hash, guaranteed to never have a collision with sequential integers. And it has no data

I wrote a test script to see how fast the suggested Perl hashing is with Big Data and millions of collisions:

sub rgb_hash() { $debug=0; $raw="pf-2015.0531-249465.pprgb.7360x4912.raw"; %e2at=%rgb2c=(); # Init Event -> Accululated Time hash; RGB -> Co +unt hash &time_event('init', \%e2at, $debug*0); # Initialize timing struct +ures &time_event('Read_Raw', \%e2at, $debug*0); $rsize = -s $raw; open(IN, "<$raw"); binmode IN; $read = read(IN, $buf, $rsize); close IN; # Use it n cluse it! die("Read ERROR!: Read $read != file $rsize\n") unless($read==$rs +ize); printf("Read %1.6f MB in raw file $raw\n", $read/1.0E6); &time_event('# Pre-extend array', \%e2at, $debug*0); $ushort[108456960+1] = 0x0000FFFF0000; # Pre-extend array &time_event('Unpack', \%e2at, $debug*0); @ushort=unpack("S*", $buf); $su = scalar(@ushort) || -1; printf("Ushort[$su], length(buf)=%1.6f, Bytes/ushort %1.9f\n", ($bl=length($buf)) / 1.0E6, $bl/$su); for($ii=0; $ii < scalar @ushort; $ii+=3) { # Extract the R, G and B values ($rr, $gg, $bb) = @ushort[$ii .. $ii+2]; # Array slice $quad=pack('Q', $rr, $gg, $bb, 0x0000); $mult=$rr*$gg*$bb; # Titanic 281 Trillion size number $white=$rr + $gg*65535 + $bb*65535*65535; # 281 Trillion size + number $bs=$rr | ($gg << 16) | ($bb << 32); # << WORKS! $rgb2c{$bs}++; # Increment count for this color if(!($ii%6E6)) { # Document time on every 100k unit $jj++; printf("R=0x%04hx, g=0x%04hx, b=0x%04hx, bs=0x%016qx, q=0x +%016qx, m=0x%016qx, W=0x%016qx\n", $rr, $gg, $bb, $bs, $quad, $mult, $white); } } &time_event("Count distinct colors", \%e2at, $debug*0); printf("RGB hash has %d distinct colors.\n", scalar keys %rgb2c); &time_event('term', \%e2at, $debug*0); # Finalize timing structur +es &show_event(\%e2at, $debug*0); # Automatically stops last event t +iming }
No snide comments about my backwoodsy writing style, please :)

This thing is 3 times as fast as the C program I wrote.
And we get exactly the same number.

I has to be the hashing function calculating an address. The C program, bless its little heart, had to do a Binary Search over an ever growing lookup table.

There was one severe problem with getting the 48bit hash key right. I packed 3 verified unsigned short into a "Q"

($rr, $gg, $bb) = @ushort[$ii .. $ii+2]; # Array slice $quad=pack('Q', $rr, $gg, $bb, 0x0000);
The R, G and B printed perfectly and agreed with the Bitshift/AND value and the QUAD was always zero:
R=0x0892, g=0x0b8c, b=0x0672, bs=0x000006720b8c0892, q=0x0000000000000000, R=0x0892 g=0x0b8c b=0x0672,
It's kind of hard to see, buy they all agree except for the
#(*#@%) QUAD!
I followed written documentation:

From: >> http://perldoc.perl.org/functions/pack.html

Formula: pack TEMPLATE,LIST; Q An unsigned quad value. Example: $foo = pack('nN', 42, 4711); # pack big-endian 16- and 32-bit unsigned integers
I had to do a bitshift, <<16 for the GREEN and <<32 for the BLUE and logically AND them together to get a UINT24_T that worked.

The performance is MOST IMPRESSIVE!!

Thanks for the Pointers (References?

================================================ Boring Report: Running c:/bin/bb.pl Wed Aug 12 20:05:10 2015 Read 216.913920 MB in raw file pf-2015.0531-249465.pprgb.7360x4912.raw Ushort[108456960], length(buf)=216.913920, Bytes/ushort 2.000000000 R=0x1746, g=0x1910, b=0x35ee, bs=0x000035ee19101746 BAD quad=0x0000000000000000, m=0x0000007ae0ad0540, W=0x000035edad3434 +24 ... RGB hash has 27645898 distinct colors. <<<<<< The good part SE: Sum of Event Times = 1.11464685 min, now-init = 1.11464715 min Event '# Pre-extend array' elapsed time = 0.003 min = 0.24%. Event 'Count distinct colors' elapsed time = -0.017 min = -1.50%. Event 'Loop' elapsed time = 1.012 min = 90.81%. Event 'Read_Raw' elapsed time = 0.002 min = 0.22%. Event 'Unpack' elapsed time = 0.097 min = 8.73%. 90.81% -> Loop 8.73% -> Unpack 0.24% -> # Pre-extend array 0.22% -> Read_Raw 0.00% -> Count distinct colors Elapsed time = 1.11 min

In reply to Re^4: Perl Hashes in C? by Anonymous Monk
in thread Perl Hashes in C? by BrianP

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.