Re: unpacking 6-bit values

Extended with the C/XS version ...

use Inline "C";
:
@dst = uic ($src);
print STDERR "uic:   (@dst[0..22] ...\n        @dst[4073..4095])\n";

cmpthese (-2, { uu => \&uu, upuu => \&upuu, upuu0 => \&upuu0, asu => \
+&asu, mlut
 => \&mlut, uic => sub { uic ($src); } });

__END__
__C__
void uic (SV *src)
{
    int i = 0;
    STRLEN l;
    unsigned char *s = (unsigned char *)SvPV (src, l);

    inline_stack_vars;
    inline_stack_reset;

    while (i < l) {
        int n = (s[i] >> 2) & 0x3f;
        inline_stack_push (newSViv (n));
        n = (s[i++] & 0x03) << 4;
        n |= (s[i] >> 4) & 0x0f;
        inline_stack_push (newSViv (n));
        n = (s[i++] & 0x0f) << 2;
        n |= (s[i] >> 6) & 0x03;
        inline_stack_push (newSViv (n));
        n = s[i++] & 0x3f;
        inline_stack_push (newSViv (n));
        }

    inline_stack_done;
    } /* uic */
[download]

And the stunning result ...

uu:    (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
upuu:  (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
upuu0: (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
asu:   (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
mlut:  (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
uic:   (50 26 62 7 35 6 35 32 50 13 56 19 61 46 55 59 39 14 21 17 16 4
+2 34 ...
        51 63 37 34 24 35 11 33 56 63 39 13 19 22 10 0 17 20 54 11 5 4
+7 56)
        Rate  upuu upuu0  mlut   asu    uu   uic
upuu   367/s    --   -3%  -56%  -62%  -65%  -95%
upuu0  380/s    3%    --  -54%  -61%  -64%  -95%
mlut   832/s  127%  119%    --  -14%  -22%  -89%
asu    972/s  165%  156%   17%    --   -9%  -87%
uu    1063/s  190%  180%   28%    9%    --  -86%
uic   7758/s 2014% 1944%  832%  698%  629%    --
[download]

Enjoy, Have FUN! H.Merijn

Comment on Re: unpacking 6-bit values Select or Download Code

Replies are listed 'Best First'.
Re^2: unpacking 6-bit values by BrowserUk (Patriarch) on Dec 11, 2010 at 09:19 UTC
I got around to incorporating your implementations into my benchmark so that I had a set of comparable figures. For the most part there are no surprises, with your implementations coming in more or less similar to my implementations of the same algorithms. Just a few percent either way as you'd expect. There are however, two stand outs. The first is your uuencode() version--well done for getting it to work which I never did--which proves to be some 20% faster than the next nearest pure perl solution. The second is your C implementation of the basic mask and shift algorithm which comes out 7% to 8% faster than the other C routines. Looking at the basic mechanics I do not see any great differences that the compiler shouldn't be able to optimise away. The only difference I can see that might account for that 8% is that you aren't mortalising your return values. I've noticed that sv_2mortal() seems more costly that you'd think it should be in the past and tried to sidestep it, but every time I have, it has led to memory leaks. When I add that to your code, the difference over the other C routines drops to 2%. Still an advantage though, so maybe your implementation is more compiler/optimiser friendly. Many thanks for your responses, it is always instructive to see how other people code the same algorithm. (Also, congrats that all your routines produced the same, correct output!) Rate upuu upuu0 buk jethro salva mlut asu jmcn uu C +_.._2 Crob C_24to32 uic upuu 25622/s -- -4% -13% -48% -53% -54% -54% -56% -64% - +87% -87% -87% -88% upuu0 26597/s 4% -- -10% -46% -51% -52% -53% -54% -62% - +86% -86% -86% -87% buk 29407/s 15% 11% -- -40% -46% -47% -48% -49% -58% - +85% -85% -85% -86% jethro 49099/s 92% 85% 67% -- -10% -12% -13% -15% -30% - +74% -74% -75% -76% salva 54710/s 114% 106% 86% 11% -- -2% -3% -6% -22% - +72% -72% -72% -74% mlut 55864/s 118% 110% 90% 14% 2% -- -1% -4% -20% - +71% -71% -71% -73% asu 56264/s 120% 112% 91% 15% 3% 1% -- -3% -20% - +71% -71% -71% -73% jmcnamara 57962/s 126% 118% 97% 18% 6% 4% 3% -- -17% - +70% -70% -70% -72% uu 70207/s 174% 164% 139% 43% 28% 26% 25% 21% -- - +64% -64% -64% -66% C_24to32_2 192359/s 651% 623% 554% 292% 252% 244% 242% 232% 174% + -- -0% -1% -7% Croboticus 192543/s 651% 624% 555% 292% 252% 245% 242% 232% 174% + 0% -- -1% -7% C_24to32 194211/s 658% 630% 560% 296% 255% 248% 245% 235% 177% + 1% 1% -- -6% uic 207580/s 710% 680% 606% 323% 279% 272% 269% 258% 196% + 8% 8% 7% -- [download] (Gah! I wish I could defeat that stupid f*** wrap algorithm!) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^3: unpacking 6-bit values by Tux (Canon) on Dec 11, 2010 at 14:07 UTC
I see a decline of close to 20% for adding mortality. The other big difference between my algorithm (and how I tested it) and the once you posted are that mine are not depending on the length of the buffer and are therefor more generic. The huge disadvantage in my testing is also the fact that I only use one single buffer, causing all 4095 values to be placed on the stack at once. And they all have to be removed when the test is done (something the non-mortalizing version skips). Still the mortalizing version of my algorithm in C/XS is still 6 times faster than the uu version. Thanks for the thread, it was fun to untangle. Enjoy, Have FUN! H.Merijn	[reply]
Re^4: unpacking 6-bit values by BrowserUk (Patriarch) on Dec 11, 2010 at 15:44 UTC
The other big difference between my algorithm (and how I tested it) and the once you posted are that mine are not depending on the length of the buffer and are therefor more generic. I should point out that the hard coded 24s in my code versions are there because I know that is the only length I will be dealing with. Discovering the length from the passed scalar, (and checking it is a multiple of 3), would have little or no effect upon performance. The point about the mortalisation comes into stark relief when you run your code in a tight loop. Where this version consumes a steady 14.7 MB for as long as you care to run it: #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => '_24to32', CLEAN_AFTER_BUILD => 0; void uic( SV src ) { int i = 0; STRLEN l; unsigned char s = (unsigned char )SvPV( src, l ); inline_stack_vars; inline_stack_reset; while (i < l) { int n = ( s[i] >> 2 ) & 0x3f; inline_stack_push( sv_2mortal( newSViv( n ) ) ); n = ( s[ i++ ] & 0x03 ) << 4; n \|= ( s[i] >> 4 ) & 0x0f; inline_stack_push( sv_2mortal( newSViv( n ) ) ); n = ( s[ i++ ] & 0x0f ) << 2; n \|= ( s[i] >> 6 ) & 0x03; inline_stack_push( sv_2mortal( newSViv( n ) ) ); n = s[ i++ ] & 0x3f; inline_stack_push( sv_2mortal( newSViv( n ) ) ); } inline_stack_done; } END_C use Math::Random::MT qw[ rand ]; while( 1 ) { my $packed = pack 'N6', map rand( 232 ), 1 .. 6; my @decoded = uic( $packed ); } [download] This version: #! perl -slw use strict; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'END_C', NAME => '_24to32', CLEAN_AFTER_BUILD => 0; void uic( SV src ) { int i = 0; STRLEN l; unsigned char s = (unsigned char )SvPV( src, l ); inline_stack_vars; inline_stack_reset; while (i < l) { int n = ( s[i] >> 2 ) & 0x3f; inline_stack_push( newSViv( n ) ); n = ( s[ i++ ] & 0x03 ) << 4; n \|= ( s[i] >> 4 ) & 0x0f; inline_stack_push( newSViv( n ) ); n = ( s[ i++ ] & 0x0f ) << 2; n \|= ( s[i] >> 6 ) & 0x03; inline_stack_push( newSViv( n ) ); n = s[ i++ ] & 0x3f; inline_stack_push( newSViv( n ) ); } inline_stack_done; } END_C use Math::Random::MT qw[ rand ]; while( 1 ) { my $packed = pack 'N6', map rand( 2**32 ), 1 .. 6; my @decoded = uic( $packed ); } [download] leaks memory at the rate of 3GB per minute, which means it has my machine swapping after less than 2 minutes and dies "Out of memory\|", having consume all 16GB of swap space, some time later. Whatever the performance hit, it is unavoidable. It ought (and I believe could), be cheaper though. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]