in reply to Re: Circular buffer instead of shift/push
in thread Circular buffer instead of shift/push

Thank you! Your right, of course, about unloading the array.
(I guess I was just using the index array again because I had it.)
Note that it would actually be:
@all = @a < $sz ? @a : @a[ $tl+1 .. $top, 0 .. $tl ];
since $tl ("tail") is left pointing at the last item written.

About replacing the memory access with a modulo operation:
Maybe. But modulo ain't cheap and at least for small $sz's,
the memory access may not be one (caches). The whole point of
the exercise was to avoid the slowdown from the
shift @a if @a == $sz;
test in the canonical loop. Modulo would seem to be in the same
class. On the Solaris I'm working on at the moment, the indexed
index is much faster than modulo, at least up to arrays of 3000.
In fact, the modulo is only marginally faster than shift/push.

Oddly, the direct retrieval seems to take about as long as retrieving
through the index array(!?!), but it certainly is clearer,
thanks again,


p

Replies are listed 'Best First'.
Re: Re: Re: Circular buffer instead of shift/push
by fundflow (Chaplain) on Jan 12, 2001 at 08:42 UTC
    Note that modulus can be very cheap sometimes (namely in 2 powers, using AND)

    If you have time, it will be interesting to know if it actually makes a difference.

      Sounds good. Just take the next higher power of 2 and then when
      you want to get the data take -$sz .. -1 . I'll try it.
      Ok, I give up.
      I certainly agree that replacing 'memory access with cheap operator'
      is a good idea and obviously would be faster.
      Here's some snips of benchmarks comparing the two:
      @a=(1,2,3,0); $x=0; $m=4; timethese(-10,{ ixwheel=> q{ $x = $a[$x] } modby2x1=> q{ $x = (++$x) & $m} }); ixwheel: 434505.60/s modby2x1: 391856.16/s timethese(-10,{ ixwheel=> q{ $x=$a[$x] } modby2x2=> q{ $x++; $x &= $m }}); ixwheel: 433011.28/s modby2x2: 539474.07/s
      Well, ok. For some reason the 2 statement version is much faster
      than the 1 statement one (($x+1) was even slower than (++$x)!).
      But now look what happens when they're used as an index:
      timethese(-10,{ ixwheelix=> q{ $b[ $x=$a[$x] ] = $x } modby2x2ix=> q{ $b[$x++] = $x; $x &= $m }}); ixwheelix: 262030.67/s modby2x2ix: 220259.36/s
      Again, this seemed to be the fastest of the variations.

      But then, I'm using perl 5.05003 on a sparc solaris, YMMV.

      ???????????,

      p
        Have you tested that code before benchmarking cause I can't the bitwise ands to produce more that a string of zeros. Anyway heres some more tests

        use Benchmark; my $m=4; my @a=(1,2,3,0); my $x=0; timethese(-10, { wheel => sub {$x = $a[$x]}, equaland => sub {++$x==$m and $x=0}, mod => sub {$x = ($x+1) % $m;}, trinary => sub {$x = ++$x==$m ? 0 : $x } } ); __END__ equaland: 23033.30/s trinary: 15301.14/s wheel: 12993.17/s mod: 12857.56/s
        So I didn't have your wheel on top of my list, unless I'm doing my benchmarking wrong again.
        As this is a very small test case, I'm not sure that it is meaningful. The idea of using an operator was to save some use of memory. In your tight loops, everything fits nicely in the L1 cache and so memory access might just be as fast.

        You have a bug in your code. Instead of 'and'ing with $m you need to use $m-1.

        Also, you as you used '$m' instead of '4' it might just be that Perl does not optimize it. You can either try to replace it with the number or add 'use constant'.

        Your analysis is interesting though.

        Cheers.

      (Yes, I didn't decrement $m when I simplified for posting.)

      Ok, you're both right. Both suggestions are faster than my dizzy wheel.
      Bitwise-and for power-of-two sizes is clearly fastest for indexing. And
      incrementing and testing for equal is clearly the most general and only
      slightly more complicated than the "straightforward" (slow)
      "@a == $sz and shift @a; push @a,$val;"

      But..., there's still this puzzlement:
      my $m=4; my $x=$m; $m--; my @a=(1..$m,0); timethese(-10, { wheel => sub {$x = $a[$x]}, equaland => sub {++$x==$m and $x=0}, mod2and => sub {$x++; $x &= $m;} } ); Benchmark: running equaland, mod2and, wheel, each for at least 10 CPU +seconds... equaland: 192120.20/s mod2and: 182722.70/s wheel: 170109.49/s
      As expected. (Using, 4192 as array size yields only slightly slower results
      and an SMP linux box yields identical proportions.)

      Yet...
      my $m=4; my $x=$m; $m--; my @n=(1..$m,0); my @a=(); timethese(-10, { wheelix => sub {$a[ $x = $n[$x] ] = $x }, equalandix => sub {$a[ ++$x==$m and $x=0 ] = $x }, mod2andix => sub {$a[ ++$x, $x &= $m];} } ); equalandix: 119200.90/s mod2andix: 142736.03/s wheelix: 130290.06/s
      ...moves equaland from first to third(!?!).

      It must be something like what fundflow suggests. In such a tight loop,
      the code for preinc and bitand is small enough to stay in cache -- and the
      code for aelem is only loaded once -- and somehow this cuts off right at
      the slightly more complicated preinc, eq and and.
      Or something...


      p
        Ok, revision thanks to chipmunk's online proofing in cb:
        "petral: in your new benchmark, mod2andix is missing the assignment to the array element"

        Back to my original point, only including equalandix:
        'Yet ...'
        my $m=4; my $x=$m; $m--; my @n=(1..$m,0); my @a=(); timethese(-10, { wheelix => sub {$a[ $x = $n[$x] ] = $x }, equalandix => sub {$a[ ++$x==$m and $x=0 ] = $x }, mod2andix => sub {$a[ ++$x, $x &= $m ] = $x;} } ); equalandix: 113742.08/s mod2andix: 116579.70/s wheelix: 126698.60/s
        ... reverses the order(!?!).
        ...and my observations above -- that fundflow is probably right -- apply even more,


        p
        And, alas, another(!) correction. repson points out that with $x=m; $m--; ++$x == $m never happens!
        But..., this didn't change the timings:
        my $m=3; my $x=$m; my @a=(1..$m,0); equaland: 191864.03/s mod2and: 178059.54/s wheel: 169709.59/s my $m=3; my $x=$m; my @n=(1..$m,0); my @a=(); equalandix: 111104.23/s mod2andix: 118804.00/s wheelix: 127501.75/s


        p