Re: Efficient bit-twiddling in Perl.
by salva (Canon) on Feb 28, 2013 at 15:17 UTC
|
Not a big improvement, but you can eliminate the AND operation when calculating $top14.
If you have to do that millions of times, you can try building a table for the combined three 6bit quantities:
my @table;
$table[$_] = [..., ..., ...] for 0..0x0003ffff;
...
for my $n (...) {
my $top = $n >> 18;
my ($nxt, $mid, $bot) = @{$table[$n & 0x0003ffff]};
...
}
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Efficient bit-twiddling in Perl.
by dave_the_m (Monsignor) on Feb 28, 2013 at 15:51 UTC
|
Note that the code you've got is already fairly efficient.
Consider the following:
my $n = 0x80061861;
for (1..10_000_000) {
my $top14 = ( $n & 0xfffc0000 ) >> 18;
my $nxt6 = ( $n & 0x0003f000 ) >> 12;
my $mid6 = ( $n & 0x00000fc0 ) >> 6;
my $bot6 = ( $n & 0x0000003f );
# or replace the four lines above with:
#my ($top14, $nxt6, $mid6, $bot6) = (1,1,1,1);
}
Running as-is on my system takes 3.1s; using just the last line (which represents a baseline of creating, setting and freeing those four lexical vars), takes 1.9s. So any savings you make are likely to be less than that.
Dave. | [reply] [d/l] |
|
|
| [reply] |
Re: Efficient bit-twiddling in Perl.
by Tux (Canon) on Feb 28, 2013 at 16:58 UTC
|
Just to test if it would bring speed (which I did not expect), I fiddled with pack and unpack. Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.
I also looked at Inline::C, and it is slower than your perl code (and I already optimized the stack stuff)
$ cat test.pl
use 5.016;
use warnings;
use Inline "C";
use Benchmark qw( cmpthese );
my $n = 0x80061861;
my $x = 10000;
sub andshift
{
my ($top14, $nxt6, $mid6, $bot6);
for (1 .. $x) {
$top14 = ($n & 0xfffc0000) >> 18;
$nxt6 = ($n & 0x0003f000) >> 12;
$mid6 = ($n & 0x00000fc0) >> 6;
$bot6 = ($n & 0x0000003f);
}
return ($top14, $nxt6, $mid6, $bot6);
} # andshift
sub packunpack
{
my ($top14, $nxt6, $mid6, $bot6);
for (1 .. $x) {
($top14, $nxt6, $mid6, $bot6) =
unpack "A14 A6 A6 A6", unpack "B32", => pack "N" => $n;
}
return ($top14, $nxt6, $mid6, $bot6);
} # packunpack
sub inlined
{
my ($top14, $nxt6, $mid6, $bot6);
for (1 .. $x) {
($top14, $nxt6, $mid6, $bot6) = split14666 ($n);
}
return ($top14, $nxt6, $mid6, $bot6);
} # inlined
say for andshift ();
say for packunpack ();
say for split14666 ($n);
cmpthese (-1, {
andshift => \&andshift,
bitstrings => \&packunpack,
inline_c => \&inlined,
});
__END__
__C__
void split14666 (int n)
{
Inline_Stack_Vars;
Inline_Stack_Reset;
EXTEND (sp, 4);
mPUSHi (((unsigned int)n & 0xfffc0000) >> 18);
mPUSHi (((unsigned int)n & 0x0003f000) >> 12);
mPUSHi (((unsigned int)n & 0x00000fc0) >> 6);
mPUSHi (((unsigned int)n & 0x0000003f) );
Inline_Stack_Done;
} /* split14666 */
$ perl test.pl
8193
33
33
33
10000000000001
100001
100001
100001
8193
33
33
33
Rate bitstrings inline_c andshift
bitstrings 121/s -- -52% -74%
inline_c 250/s 108% -- -47%
andshift 470/s 290% 88% --
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
|
unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6', unpack '
+B32', pack 'N', $n;
Which does the job but (not unexpectedly) is an order of magnitude slower than my second attempt in the OP.
That was a pretty good saving, but whenever I have asked these type of questions in the past, someone has always come up with a better solution.
And indeed, salva has; albeit only a further 20% saving. I also tried using separate lookup tables to save a dereference to no avail: #! perl -slw
use strict;
use Time::HiRes qw[ time ];
my @lookup; $#lookup = 0x3ffff;
$lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x
+3f ]
for 0 .. 0x3ffff;
my( @nxt, @mid, @bot );
$nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12,
$mid[ $_ ] = ( $_ & 0xfc0 ) >> 6,
$bot[ $_ ] = $_ & 0x3f
for 0 .. 0x3ffff;
sub stuff{
# print "@_";
}
our $ITERS //= 10e6;
my $n = 0x80061861;
my $start = time;
for ( 1 .. $ITERS ) {
stuff(
( $n & 0xffc00000 ) >> 18,
( $n & 0x0003f000 ) >> 12,
( $n & 0x00000fc0 ) >> 6,
( $n & 0x0000003f )
);
}
printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for ( 1 .. $ITERS ) {
stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } );
}
printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for ( 1 .. $ITERS ) {
my $i = $n & 0x3ffff;
stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] );
}
printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for ( 1 .. $ITERS ) {
stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6
+', unpack 'B32', pack 'N', $n );
}
printf "(un)pack took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
__END__
C:\test>1021064
Shift&and took: 0.000000482421 seconds
Lookup took: 0.000000386419 seconds
Lookup2 took: 0.000000547556 seconds
(un)pack took: 0.000005933478 seconds
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
I'll qualify this result to say salva's result doesn't necessarily buy you anything if you are only doing this millions of times. On my machine, I get the timings
Generation took: 0.218580069542 seconds
Shift&and took: 0.000000471000 seconds
Lookup took: 0.000000395095 seconds
Where generation is calculated with
my $start = time;
for (1..100) {
my @lookup; $#lookup = 0x3ffff;
$lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x
+3f ]
for 0 .. 0x3ffff;
}
printf "Generation took: %.12f seconds\n", ( time() - $start )/100;
I ran the transforms for 10^7 iterations, though frankly all metrics were still a little volatile for my taste. This puts breakeven at 2.9 million iterations and 5% speed up at 4.2 million. YMMV.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
|
|
|
|
use strict;
use warnings;
use Time::HiRes qw[ time ];
use Bit::Vector;
my @lookup; $#lookup = 0x3ffff;
$lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x
+3f ]
for 0 .. 0x3ffff;
my( @nxt, @mid, @bot );
$nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12,
$mid[ $_ ] = ( $_ & 0xfc0 ) >> 6,
$bot[ $_ ] = $_ & 0x3f
for 0 .. 0x3ffff;
sub stuff{
# print "@_";
}
our $ITERS //= 10e6;
my $n = 0x80061861;
my $start = time;
for ( 1 .. $ITERS ) {
stuff(
( $n & 0xffc00000 ) >> 18,
( $n & 0x0003f000 ) >> 12,
( $n & 0x00000fc0 ) >> 6,
( $n & 0x0000003f )
);
}
printf " Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITE
+RS;
$start = time;
for ( 1 .. $ITERS ) {
stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } );
}
printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITE
+RS;
$start = time;
for ( 1 .. $ITERS ) {
my $i = $n & 0x3ffff;
stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] );
}
printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITE
+RS;
$start = time;
for ( 1 .. $ITERS ) {
stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6
+', unpack 'B32', pack 'N', $n );
}
printf " (un)pack took: %.12f seconds\n", ( time() - $start ) / $ITE
+RS;
$n="$n";
my $vector;
$start = time;
for (1 .. $ITERS ) {
$vector = Bit::Vector->new_Hex(32, $n);
stuff($vector->Chunk_Read(14, 18),
$vector->Chunk_Read( 6, 12),
$vector->Chunk_Read( 6, 6),
$vector->Chunk_Read( 6, 0)
);
}
printf "Bit::Vector took: %.12f seconds\n", ( time() - $start ) / $ITE
+RS;
__END__
C:\test>perl bv.pl
Shift&and took: 0.000000551174 seconds
Lookup took: 0.000000272709 seconds
Lookup2 took: 0.000000335823 seconds
(un)pack took: 0.000003393702 seconds
Bit::Vector took: 0.000005372898 seconds
| [reply] [d/l] |
|
|
|
|
Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.
If you use a random $n, you will see that the table approach becomes actually quite slower than the simple one.
I have tried encoding the table in other ways, but have not been able to find any one good enough:
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my @lookup; $#lookup = 0x3ffff;
$lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x
+3f ]
for 0 .. 0x3ffff;
my( @nxt, @mid, @bot );
$nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12,
$mid[ $_ ] = ( $_ & 0xfc0 ) >> 6,
$bot[ $_ ] = $_ & 0x3f
for 0 .. 0x3ffff;
my (@lookup3);
$#lookup3 = 0x3ffff;
$lookup3[$_ << 6] = [$_ >> 6, $_ & 0x3f] for 0 .. 0xfff;
my $lookup4 = 'x' x (3 * (1<<18));
$lookup4 = '';
$lookup4 .= pack CCC => $_ >> 12, ($_>>6) & 0x3f, $_ & 0x3f for 0..0x3
+ffff;
my $lookup6 = 'x' x (2 * (1<<12));
$lookup6 = '';
$lookup6 .= pack CC => $_ >> 6, $_ & 0x3f for 0..0xfff;
print "tables generated";
our $ITERS //= 10e6;
my @n = map int(rand(1<<18)), 1..$ITERS;
print "sample data generated";
sub stuff{
# print "@_";
}
my $start = time;
for my $n (@n) {
stuff(
( $n ) >> 18,
( $n & 0x0003f000 ) >> 12,
( $n & 0x00000fc0 ) >> 6,
( $n & 0x0000003f )
);
}
printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } );
}
printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, @{ $lookup3[$n & 0x3ffc0] }, $n & 0x3f );
}
printf " Lookup3 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, unpack CCC => substr($lookup4, 3 * ($n & 0x3ffff)
+, 3));
}
printf " Lookup4 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, unpack 'x'.(3 * ($n & 0x3ffff)).'CCC' => $lookup4
+);
}
printf " Lookup5 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, unpack(CC => substr($lookup6, ($n & 0x3ffc0) >> 5
+, 3)), $n & 0x3f);
}
printf " Lookup6 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
$start = time;
for my $n (@n) {
stuff( $n >> 18, unpack('x'.(($n & 0x3ffc0) >> 5).'CC', $lookup6),
+ $n & 0x3f);
}
printf " Lookup7 took: %.12f seconds\n", ( time() - $start ) / $ITERS
+;
__END__
Shift&and took: 0.000000783860 seconds
Lookup took: 0.000001267049 seconds
Lookup3 took: 0.000001018672 seconds
Lookup4 took: 0.000001903985 seconds
Lookup5 took: 0.000002110766 seconds
Lookup6 took: 0.000001607903 seconds
Lookup7 took: 0.000001791258 seconds
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|
|
|
|
I think you can shave off a lot in XS (or Inline::C) when you switch to prebound variables, ala bind_columns () in DBI or Text::CSV_XS, as much of the XS overhead is stack-work. That said, it then is still is function calls, so the stack will be involved. If you pre-bind both in- AND out- parameters, you don't need to push variables on the stack and pop return values of the stack. That should speed up quite a bit.
The next step could be looking in to op-hooks, so there are no real perl-level function calls anymore, but looking at what you already have, I'm wondering if it would be worth the effort (other than to learn from it),
The lookup solutions have array-index ops in the back. I am stunned by the 20% gain you get, and wonder if that would differ per perl version and/or architecture.
Anyway, the solution that is fastest on the machine the problem is solved on is most important, even if another method would be four times faster on z/OS with perl4 :)
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: Efficient bit-twiddling in Perl.
by kcott (Archbishop) on Mar 01, 2013 at 08:56 UTC
|
G'day BrowserUk,
I tried this a few different ways. Shifting first then just ANDing the six bits seems to be faster than ANDing then shifting. About half of the time seemed to be taken with the assignments, so if you can avoid some or all of those you get additional improvements. Here's the various tests:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Benchmark qw{cmpthese :hireswallclock};
my $n = 0x80061861;
say "and_rshift: $n => ", join(' : ', and_rshift());
say "lshift_rshift: $n => ", join(' : ', lshift_rshift());
say "mixed: $n => ", join(' : ', mixed());
say "mixed_assign: $n => ", join(' : ', mixed_assign());
say "just_shift: $n => ", join(' : ', just_shift());
cmpthese(-10, {
and_rshift => \&and_rshift,
lshift_rshift => \&lshift_rshift,
mixed => \&mixed,
mixed_assign => \&mixed_assign,
just_shift => \&just_shift,
});
sub and_rshift {
my $top14 = ( $n & 0xfffc0000 ) >> 18;
my $nxt6 = ( $n & 0x0003f000 ) >> 12;
my $mid6 = ( $n & 0x00000fc0 ) >> 6;
my $bot6 = ( $n & 0x0000003f );
return ($top14, $nxt6, $mid6, $bot6);
}
sub lshift_rshift {
my $top14 = $n >> 18;
my $nxt6 = $n << 46 >> 58;
my $mid6 = $n << 52 >> 58;
my $bot6 = $n << 58 >> 58;
return ($top14, $nxt6, $mid6, $bot6);
}
sub mixed {
return ($n >> 18, $n >> 12 & 0x3f, $n >> 6 & 0x3f, $n & 0x3f);
}
sub mixed_assign {
my $top14 = $n >> 18;
my $nxt6 = $n >> 12 & 0x3f;
my $mid6 = $n >> 6 & 0x3f;
my $bot6 = $n & 0x3f;
return ($top14, $nxt6, $mid6, $bot6);
}
sub just_shift {
return ($n >> 18, $n << 46 >> 58, $n << 52 >> 58, $n << 58 >> 58);
}
Output:
$ pm_bit_twiddle.pl
and_rshift: 2147883105 => 8193 : 33 : 33 : 33
lshift_rshift: 2147883105 => 8193 : 33 : 33 : 33
mixed: 2147883105 => 8193 : 33 : 33 : 33
mixed_assign: 2147883105 => 8193 : 33 : 33 : 33
just_shift: 2147883105 => 8193 : 33 : 33 : 33
Rate and_rshift lshift_rshift mixed_assign just_shi
+ft mixed
and_rshift 1942328/s -- -7% -11% -5
+2% -53%
lshift_rshift 2088397/s 8% -- -4% -4
+8% -50%
mixed_assign 2182671/s 12% 5% -- -4
+6% -48%
just_shift 4037158/s 108% 93% 85%
+-- -3%
mixed 4163497/s 114% 99% 91%
+3% --
| [reply] [d/l] [select] |
|
|
and_rshift: 2147883105 => 8193 : 33 : 33 : 33
lshift_rshift: 2147883105 => 8193 : 33 : 33 : 33
mixed: 2147883105 => 8193 : 33 : 33 : 33
mixed_assign: 2147883105 => 8193 : 33 : 33 : 33
just_shift: 2147883105 => 8193 : 33 : 33 : 33
Rate just_shift mixed mixed_assign and_rshift lsh
+ift_rshift
just_shift 247/s -- -2% -43% -48%
+ -52%
mixed 252/s 2% -- -42% -47%
+ -52%
mixed_assign 434/s 75% 72% -- -9%
+ -16%
and_rshift 480/s 94% 90% 10% --
+ -8%
lshift_rshift 520/s 110% 106% 20% 8%
+ --
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
|
C:\test>1021064
Shift&and took: 0.000000611130 seconds
justshift took: 0.000000651048 seconds
Lookup took: 0.000000488745 seconds
Lookup2 took: 0.000000706896 seconds
(un)pack took: 0.000007464467 seconds
I was a bit surprised by that as both shift & and are designated as single clock operations on my cpu. I guess the difference is in the perl operations that overlay them.
I follow your drift regarding the effect that assignment and stack handling can have on a benchmark of this nature; but the values are re-used several times through the rest of the body of the inner loop, so the assignment is a constant factor that I've chosen to exclude.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Efficient bit-twiddling in Perl.
by flexvault (Monsignor) on Mar 01, 2013 at 11:44 UTC
|
BrowserUk,
This is more about the source of your input (32bit) stream. If it just (0..2**32) and then repeat, then you look like you have the solution now.
But if the input stream is being generated from an analog or digital source and the encoding could have repetition, then using 'memorize' or caching the results in a hash might give you some additional performance.
I once saw an IBM demo that compressed 1 minute of full color video of a football game unto a 360K floppy. IBM's explanation was that so much of the data was repetitious that only the differences from frame to frame was being saved on the floppy.
Just a thought...Ed
"Well done is better than well said." - Benjamin Franklin
| [reply] |