in reply to Perl Hashes in C?

Counting unique pixel values is the equivalent of sort | uniq. No hashing, no associative maps are necessary.

A good solution involves picking the most suitable sort algorithm and implementation. One might bucket by one color, then mergesort and count on the 32bit values. Anyway, 30 million items single-threaded — this ought to be ~1 sec job.

Replies are listed 'Best First'.
Re^2: Perl Hashes in C? (just sort)
by Your Mother (Archbishop) on Aug 15, 2015 at 13:35 UTC

    This stuff is absolutely not my forte but surely sorting is more expensive and slower than hashing unless memory use is thrashing... In any case, whatever it ought to be, others have given code and timings. :P

      Well... see Integer sorting. Sort can do better than n log n in specific circumstances. Whether it's sorting or hashing, the items are stashed in some location and the more you know about the distribution, the less you need to shuffle them around. Underneath, similar data structures may be used, indeed it may be hard to tell sorting and hashing apart here. (And remember, computer memory is linearly addressed which matches the single-dimensional sort ordering.)

      Mergesort is well amenable to parallel processing and SIMD streaming. It is practical and robust. Think of the same counting problem broken down sufficiently: you have a small strip of values, say couple hundred. Do you start hashing them or do you just run them through the pipeline?