in reply to What is the Best Way to find Unique UINT48 RGB Colors?
Presumably, this is a continuation of the older thread.
I had an _epi32 mergesort implemented from earlier time (*), and so repurposed it for the given task. No doubt a parallel version would scale rather nicely, too, though this I haven't tried. The 7360x4912 pixels of sciurine menace obtained via dcraw.
Note, this is Lynnfield CPU without avx. Same test in a timethis loop:$ perf stat perl rgb48.pl squirrl.dat [1452115604.189627] sort_and_uniq: [ 0.294368] first binning [ 0.250713] second binning [ 0.352289] merge and count squirrl.dat == 35556508 Performance counter stats for 'perl rgb48.pl squirrl.dat': 937.769123 task-clock # 1.000 CPUs utilized 13 context-switches # 0.014 K/sec 0 cpu-migrations # 0.000 K/sec 55529 page-faults # 0.059 M/sec 2775248733 cycles # 2.959 GHz 1375100650 stalled-cycles-frontend # 49.55% frontend cycles idle 588508119 stalled-cycles-backend # 21.21% backend cycles idle 3397637328 instructions # 1.22 insns per cycle # 0.40 stalled cycles per insn 310595954 branches # 331.207 M/sec 9170806 branch-misses # 2.95% of all branches 0.937321108 seconds time elapsed
timethis 10: 8 wallclock secs ( 8.33 usr + 0.00 sys = 8.33 CPU) @ 1.20/s (n=10)
There are other optimized sort implementations out there. Intel IPP (Integrated Performance Primitives) has the following routines, among myriad others
I'd expect these to provide a well-optimized solution for any Intel platform.IppStatus IppsSortAscend_32s_I(Ipp32s* pSrcDst, int len); IppStatus IppsSortAscend_64f_I(Ipp64f* pSrcDst, int len); IppStatus IppsSortRadixAscend_32u_I(Ipp32u* pSrcDst, Ipp32u* pTmp, Ipp32s len); ...
|
---|