in reply to Circular buffer instead of shift/push

Nice.

I would change the main part to: (using your terms)

$a[ $tl++ % $sz ] = $val
or
$a[ $tl=($tl+1)% $sz ] = $val
This replaces a memory access with cheap operator.

The last line can be written as:

@all=@a[ $tl..($sz-1), 0..($tl-1) ];
(assuming the array is full, othewise add your check)

Replies are listed 'Best First'.
Re: Re: Circular buffer instead of shift/push
by petral (Curate) on Jan 12, 2001 at 08:06 UTC
    Thank you! Your right, of course, about unloading the array.
    (I guess I was just using the index array again because I had it.)
    Note that it would actually be:
    @all = @a < $sz ? @a : @a[ $tl+1 .. $top, 0 .. $tl ];
    since $tl ("tail") is left pointing at the last item written.

    About replacing the memory access with a modulo operation:
    Maybe. But modulo ain't cheap and at least for small $sz's,
    the memory access may not be one (caches). The whole point of
    the exercise was to avoid the slowdown from the
    shift @a if @a == $sz;
    test in the canonical loop. Modulo would seem to be in the same
    class. On the Solaris I'm working on at the moment, the indexed
    index is much faster than modulo, at least up to arrays of 3000.
    In fact, the modulo is only marginally faster than shift/push.

    Oddly, the direct retrieval seems to take about as long as retrieving
    through the index array(!?!), but it certainly is clearer,
    thanks again,


    p
      Note that modulus can be very cheap sometimes (namely in 2 powers, using AND)

      If you have time, it will be interesting to know if it actually makes a difference.

        Sounds good. Just take the next higher power of 2 and then when
        you want to get the data take -$sz .. -1 . I'll try it.
        Ok, I give up.
        I certainly agree that replacing 'memory access with cheap operator'
        is a good idea and obviously would be faster.
        Here's some snips of benchmarks comparing the two:
        @a=(1,2,3,0); $x=0; $m=4; timethese(-10,{ ixwheel=> q{ $x = $a[$x] } modby2x1=> q{ $x = (++$x) & $m} }); ixwheel: 434505.60/s modby2x1: 391856.16/s timethese(-10,{ ixwheel=> q{ $x=$a[$x] } modby2x2=> q{ $x++; $x &= $m }}); ixwheel: 433011.28/s modby2x2: 539474.07/s
        Well, ok. For some reason the 2 statement version is much faster
        than the 1 statement one (($x+1) was even slower than (++$x)!).
        But now look what happens when they're used as an index:
        timethese(-10,{ ixwheelix=> q{ $b[ $x=$a[$x] ] = $x } modby2x2ix=> q{ $b[$x++] = $x; $x &= $m }}); ixwheelix: 262030.67/s modby2x2ix: 220259.36/s
        Again, this seemed to be the fastest of the variations.

        But then, I'm using perl 5.05003 on a sparc solaris, YMMV.

        ???????????,

        p
        (Yes, I didn't decrement $m when I simplified for posting.)

        Ok, you're both right. Both suggestions are faster than my dizzy wheel.
        Bitwise-and for power-of-two sizes is clearly fastest for indexing. And
        incrementing and testing for equal is clearly the most general and only
        slightly more complicated than the "straightforward" (slow)
        "@a == $sz and shift @a; push @a,$val;"

        But..., there's still this puzzlement:
        my $m=4; my $x=$m; $m--; my @a=(1..$m,0); timethese(-10, { wheel => sub {$x = $a[$x]}, equaland => sub {++$x==$m and $x=0}, mod2and => sub {$x++; $x &= $m;} } ); Benchmark: running equaland, mod2and, wheel, each for at least 10 CPU +seconds... equaland: 192120.20/s mod2and: 182722.70/s wheel: 170109.49/s
        As expected. (Using, 4192 as array size yields only slightly slower results
        and an SMP linux box yields identical proportions.)

        Yet...
        my $m=4; my $x=$m; $m--; my @n=(1..$m,0); my @a=(); timethese(-10, { wheelix => sub {$a[ $x = $n[$x] ] = $x }, equalandix => sub {$a[ ++$x==$m and $x=0 ] = $x }, mod2andix => sub {$a[ ++$x, $x &= $m];} } ); equalandix: 119200.90/s mod2andix: 142736.03/s wheelix: 130290.06/s
        ...moves equaland from first to third(!?!).

        It must be something like what fundflow suggests. In such a tight loop,
        the code for preinc and bitand is small enough to stay in cache -- and the
        code for aelem is only loaded once -- and somehow this cuts off right at
        the slightly more complicated preinc, eq and and.
        Or something...


        p