in reply to Duff's Device

When you code in machine language, there are two things which come readily apparent:

In CompSci 100, you'll learn that there are three components to a loop: an initial condition, a body of work, and a test against the work. In CompEng 100, you'll learn that there is really only one construct at the lowest level: if a simple condition is met, jump elsewhere. All the fussy academics of loop structure are just abstractions to which you can adhere for safety.

Almost every hardware instruction processor suffers a performance hit when they're told in one instruction that they won't in fact be running the NEXT instruction next. Today's processors do a lot of thinking ahead, and predicting what registers or memory locations they'll have to start fetching, even before they've reached the instruction. All that thinking ahead is moot if the condition requires a jump.

Duff's Device is just another way of writing a loop, using the barest features of the processor: jump to new locations, but minimize how often that happens.

It's meaningless in Perl, except as an anecdote, because the virtual machine has so many branches at the machine code level no matter what code you write, and no additional expensive penalty for jumping around in Perl code.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re: Duff's Device
by Abigail-II (Bishop) on Feb 24, 2004 at 14:24 UTC
    It's meaningless in Perl, except as an anecdote, because the virtual machine has so many branches at the machine code level no matter what code you write, and no additional expensive penalty for jumping around in Perl code.
    It's not just the jumping you save on, it's also the reduced testing. Duff's device does two things: it does loop unrolling, and it uses a neat/tricky way of dealing with the border case of a loop unrolling technique. Loop unrolling pays because of not having to evaluate a condition. Here's a benchmark showing duff's device to be significant faster than a plain loop (yes, the work done is contrived, but that's not the point):
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; our ($x, $y); our $N = shift || 100; cmpthese -1 => { noduff => '$x = 0; my $n = $N; while ($n --) {$x ++}', duff => '$y = 0; my $n = int (($N + 7) / 8); goto "LABEL" . ($N % 8); { LABEL0: $y ++; LABEL7: $y ++; LABEL6: $y ++; LABEL5: $y ++; LABEL4: $y ++; LABEL3: $y ++; LABEL2: $y ++; LABEL1: $y ++; redo if -- $n > 0 }', }; die "Unequal \$N == $N; \$x == $x; \$y == $y.\n" unless $N == $x && $N + == $y; __END__ Rate noduff duff noduff 39384/s -- -42% duff 68267/s 73% --

    Abigail

      Right. The "edge case" is what fraction of the larger loop needs to be run, to emulate a smaller loop.

      As an analogy, look at running tracks. One track is a loop that is 0.25 miles long. Another track is 1.00 miles in a straight line. If you want to run 1.00 miles, this is really easy to figure out where you should start and stop on either track. If you want to run 1.25 miles, it's easy on the loop track, but you have to figure out where to start or stop on the line track.

      In CPU terms, leaving the track at an odd place requires more thought, more often, than entering the track at an odd place. So you calculate the fractional component once and jump into the race in the middle.

      --
      [ e d @ h a l l e y . c c ]

        I fail to understand what you are saying. Any speed gains of Duff's device aren't regarding the calculation of fractional components, or where to enter or leave the track at the odd place. That's only done once.

        It's more like running tracks where each time you pass the finish line you have to stop, and decrease the lap counter. Running 1 mile on the smaller track means having to stop 4 times to decrease the counter, while running on the longer track means stopping once. That's loop unrolling. The trick of Duff's device is doing that instead of running on big track for N miles, and a couple of runs on the small track to make up for the fractional distance, you start somewhere on the big track.

        It's the loop unrolling that saves time, it's Duff's trick of mixing switch/while that saves on tracks. Or rather, it saves on (duplicate) code.

        Abigail

      So the important point here is that we only have to evaluate if (--$n > 0) one eighth of the time?


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        So the important point here is that we only have to evaluate if (--$n > 0) one eighth of the time?
        To quote myself:
        Duff's device does two things: it does loop unrolling, and it uses a neat/tricky way of dealing with the border case of a loop unrolling technique.
        So, I'd say it's an important thing.

        Abigail