I have a small Perl script that takes a crop spec with the XY position, width and height of a box, grabs the 48 bit RGB as a hash key and increments the data. It takes just a second on a ~100x100 pixel rectangle.

I use it to measure the quality of color spaces and various photo processes.

I wrote something much more clunky in C for use on entire, 216 MB pictures. But it takes 2.17 HOURS to run on a D800E pic with 27 Million unique colors. The lookup table made up of 64bit long longs gets to be 27E6 long toward the end and takes ~25 bsearch comparisons per lookup.

What I need is a good algorithm to take a 48 bit integer with quantomly random values and spit out an integer in the range 0-36E6. Perl has some pretty slick Hashing capabilities. I wonder whether any of them could be leveraged here.

GORY DETAILS: ----------

I have a C program which uses an array of 36 Million uint64s to hold 48 bits of RGB color data (16bits/channel) and 16 bits for a counter for the distinct colors found when scanning a 216 MB .RAW file.

For extremely colorful pictures, the length of this lookup table gets as high as 27 Million items. Even with a bsearch, it takes over 2 hours to count the number of pixels sharing each distinct color.

How does one HASH 48 bits of ~random data RGB data and return an integer in the range 36E6 to as high as many GB with 32GB of system RAM?

CURRENT PROCEDURE: -------------------------------

BSearch the known colors for the new color and increment the 16 bit counter if found. Else, append the new, unique colors to the end of the array and periodically QSort the new ones. Then I do a shuffle merge on the millions of old, sorted colors and the hundreds of new, colors (after qsorting the new) appending as many as possible from each array onto a merge array and go back and forth until old and new are all merged.

QSorting an array with millions of sorted and a few hundred unsorted seems like a waste. If bsearch fails on the sorted bottom, I do a linear search on the unsorted top and append the color if not found.

The BSearch comparison function masks off the highest 16 bits (0x0000FFFFFFFFFFFF) of *A and *B and returns 1 if *A & MASK > *B & MASK, -1 if <, else 0. Can't think of a faster way to compare 2 long longs from pointers.

What I need is a Perl Hash (just do it) and C speed; Cerl Language? tags hash raw rgb 48bit


In reply to Perl Hashes in C? by BrianP

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.