in reply to Re: Re: Circular buffer instead of shift/push
in thread Circular buffer instead of shift/push

Note that modulus can be very cheap sometimes (namely in 2 powers, using AND)

If you have time, it will be interesting to know if it actually makes a difference.

  • Comment on Re: Re: Re: Circular buffer instead of shift/push

Replies are listed 'Best First'.
Re: Re: Re: Re: Circular buffer instead of shift/push
by petral (Curate) on Jan 12, 2001 at 08:54 UTC
    Sounds good. Just take the next higher power of 2 and then when
    you want to get the data take -$sz .. -1 . I'll try it.
Re: Re: Re: Re: Circular buffer instead of shift/push
by petral (Curate) on Jan 13, 2001 at 07:40 UTC
    Ok, I give up.
    I certainly agree that replacing 'memory access with cheap operator'
    is a good idea and obviously would be faster.
    Here's some snips of benchmarks comparing the two:
    @a=(1,2,3,0); $x=0; $m=4; timethese(-10,{ ixwheel=> q{ $x = $a[$x] } modby2x1=> q{ $x = (++$x) & $m} }); ixwheel: 434505.60/s modby2x1: 391856.16/s timethese(-10,{ ixwheel=> q{ $x=$a[$x] } modby2x2=> q{ $x++; $x &= $m }}); ixwheel: 433011.28/s modby2x2: 539474.07/s
    Well, ok. For some reason the 2 statement version is much faster
    than the 1 statement one (($x+1) was even slower than (++$x)!).
    But now look what happens when they're used as an index:
    timethese(-10,{ ixwheelix=> q{ $b[ $x=$a[$x] ] = $x } modby2x2ix=> q{ $b[$x++] = $x; $x &= $m }}); ixwheelix: 262030.67/s modby2x2ix: 220259.36/s
    Again, this seemed to be the fastest of the variations.

    But then, I'm using perl 5.05003 on a sparc solaris, YMMV.

    ???????????,

    p
      Have you tested that code before benchmarking cause I can't the bitwise ands to produce more that a string of zeros. Anyway heres some more tests

      use Benchmark; my $m=4; my @a=(1,2,3,0); my $x=0; timethese(-10, { wheel => sub {$x = $a[$x]}, equaland => sub {++$x==$m and $x=0}, mod => sub {$x = ($x+1) % $m;}, trinary => sub {$x = ++$x==$m ? 0 : $x } } ); __END__ equaland: 23033.30/s trinary: 15301.14/s wheel: 12993.17/s mod: 12857.56/s
      So I didn't have your wheel on top of my list, unless I'm doing my benchmarking wrong again.
        'So I didn't have your wheel on top of my list'
        -- If you mean the output order, I had spelled it "ixwheel",
        (also, I may have patched Benchmark to do 'sort keys').

        For the more general response to your help, see below and slightly to the left.
        I replied at the same level as your reply in order to reply to both at once and so as not to march off the page completely.

        thank you,
        p
      As this is a very small test case, I'm not sure that it is meaningful. The idea of using an operator was to save some use of memory. In your tight loops, everything fits nicely in the L1 cache and so memory access might just be as fast.

      You have a bug in your code. Instead of 'and'ing with $m you need to use $m-1.

      Also, you as you used '$m' instead of '4' it might just be that Perl does not optimize it. You can either try to replace it with the number or add 'use constant'.

      Your analysis is interesting though.

      Cheers.

Re: Re: Re: Re: Circular buffer instead of shift/push
by petral (Curate) on Jan 16, 2001 at 08:17 UTC
    (Yes, I didn't decrement $m when I simplified for posting.)

    Ok, you're both right. Both suggestions are faster than my dizzy wheel.
    Bitwise-and for power-of-two sizes is clearly fastest for indexing. And
    incrementing and testing for equal is clearly the most general and only
    slightly more complicated than the "straightforward" (slow)
    "@a == $sz and shift @a; push @a,$val;"

    But..., there's still this puzzlement:
    my $m=4; my $x=$m; $m--; my @a=(1..$m,0); timethese(-10, { wheel => sub {$x = $a[$x]}, equaland => sub {++$x==$m and $x=0}, mod2and => sub {$x++; $x &= $m;} } ); Benchmark: running equaland, mod2and, wheel, each for at least 10 CPU +seconds... equaland: 192120.20/s mod2and: 182722.70/s wheel: 170109.49/s
    As expected. (Using, 4192 as array size yields only slightly slower results
    and an SMP linux box yields identical proportions.)

    Yet...
    my $m=4; my $x=$m; $m--; my @n=(1..$m,0); my @a=(); timethese(-10, { wheelix => sub {$a[ $x = $n[$x] ] = $x }, equalandix => sub {$a[ ++$x==$m and $x=0 ] = $x }, mod2andix => sub {$a[ ++$x, $x &= $m];} } ); equalandix: 119200.90/s mod2andix: 142736.03/s wheelix: 130290.06/s
    ...moves equaland from first to third(!?!).

    It must be something like what fundflow suggests. In such a tight loop,
    the code for preinc and bitand is small enough to stay in cache -- and the
    code for aelem is only loaded once -- and somehow this cuts off right at
    the slightly more complicated preinc, eq and and.
    Or something...


    p
      Ok, revision thanks to chipmunk's online proofing in cb:
      "petral: in your new benchmark, mod2andix is missing the assignment to the array element"

      Back to my original point, only including equalandix:
      'Yet ...'
      my $m=4; my $x=$m; $m--; my @n=(1..$m,0); my @a=(); timethese(-10, { wheelix => sub {$a[ $x = $n[$x] ] = $x }, equalandix => sub {$a[ ++$x==$m and $x=0 ] = $x }, mod2andix => sub {$a[ ++$x, $x &= $m ] = $x;} } ); equalandix: 113742.08/s mod2andix: 116579.70/s wheelix: 126698.60/s
      ... reverses the order(!?!).
      ...and my observations above -- that fundflow is probably right -- apply even more,


      p
      And, alas, another(!) correction. repson points out that with $x=m; $m--; ++$x == $m never happens!
      But..., this didn't change the timings:
      my $m=3; my $x=$m; my @a=(1..$m,0); equaland: 191864.03/s mod2and: 178059.54/s wheel: 169709.59/s my $m=3; my $x=$m; my @n=(1..$m,0); my @a=(); equalandix: 111104.23/s mod2andix: 118804.00/s wheelix: 127501.75/s


      p