in reply to does perl have branch prediction

There is a consistent but marginal difference when summing an unsorted array compared to summing the same array once sorted:

@a = map int( rand 256 ), 1 .. 10e6;; $t=time; $n=0; $_ > 128 and $n+=$_ for @a; print time-$t, " $n";; 1.42373704910278 952115465 @a = sort{ $a <=> $b } @a;; $t=time; $n=0; $_ > 128 and $n+=$_ for @a; print time-$t, " $n";; 1.31046295166016 952115465

However, the saving is not sufficient to make it worth while the very high cost of doing the sort:

@a = map int( rand 256 ), 1 .. 10e6;; $t=time; $n=0; $_ > 128 and $n+=$_ for @a; print time-$t, " $n";; 1.36710000038147 952320992 $t=time; @a = sort{ $a <=> $b } @a; $n=0; $_ > 128 and $n+=$_ for @a; +print time-$t, " $n";; 5.25063991546631 952320992

Perl doesn't attempt to optimise that complex conditional beyond short-circuiting early.

I'd be very surprised if pre-compiled C++ would reorder the clauses unless it is very obvious at compile-time that fun_Z() will predominantly produce a false return. It certainly won't reorder them at runtime, and unless all the functions in the conditional are very simple and in-lined, it is doubtful whether the branch-prediction at the chip-level will have sufficient scope to do so.

Java might re-order the clauses at runtime via the JIT compiler, if the complex conditional were embedded in a loop.

But, it would do so once during the first iteration of the loop and if during that pass fun_Z() returned its atypical response -- ie. if it returned false when it mostly would return true -- then it could result in an overall pessimisation rather than an optimisation, by seeding the branch prediction the wrong way.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: does perl have branch prediction
by david2008 (Scribe) on Jan 29, 2014 at 06:44 UTC
    How can we explain the small time difference between the sorted and unsorted array summing?
      How can we explain the small time difference between the sorted and unsorted array summing?

      It is almost certainly down to the cpu instruction cache.

      1. With the sorted array, on average:

        for the first half of the array, the instructions required in the loop will already be in the cache.

        Then there will be a need to load the required instructions into the cache, once, when the break point is reached, and they will remain there for the second half of the array.

      2. With the unsorted array,

        the instruction cache potentially might need to be flushed and repopulated for every other iteration of the loop.

      Please note, that is just my best guess. I can't think of any easy -- or even feasible given what hardware and software I have available to me -- way of attempting to verify my hypothesis.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Its all to do with the workflow at a hardware level. I cant remember specifically all of the stages of processing, but suffice it to say that there are multiple stages of processing. When a program comes to a branch (such as an if), it doesnt pause the processing input for however many cycles, just to wait for the result of the if to pop out the end, it instead uses branch prediction to choose a path to go down (really useful if you're running in a loop, not so useful otherwise) and shovels the next instruction into the first stage of processing. When the branch is finally resolved, it determines whether the results from the next instruction are kept or discarded.

      If you look at the if you are doing and the dataset, it becomes obvious that a sorted dataset will optimise the branch prediction ability of the CPU, resulting in a minor performance difference (becoming larger as the dataset grows), however as pointed out, sorting the dataset is a lot more costly than the performance gained. (and this is because you are looking at performing many more cycles of processing to sort each item in the array than you will actually save)

      david2008:

      Various sort algorithms have different performance based on the order of the incoming data, so you don't need branch prediction to explain it.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.