in reply to Weird performance issue with Strawberries and Inline::C

C3 avoids a jump instruction by always performing an addition. For a super-tight loop like this, those jumps can make a big difference. I'm not sure that I would expect a C compiler to realize that C4 could be rewritten as C3.

I had a longer answer, but ran it past Claude.AI and it knew a lot more about the topic than I did and my answer would have been wrong. Of course this could be inaccurate/wrong too, but it looks more accurate than what I was going to say. At the risk of offending people, here's is a portion of its answer:

On x86-64, there isn't a direct conditional arithmetic instruction like you describe, but there are several approaches that can eliminate branches:
  1. CMOVcc (Conditional Move) - The most common approach
    ; Assuming condition result is in a flag mov eax, i lea edx, [rax + 1] ; edx = i + 1 cmovnz eax, edx ; if condition != 0, eax = edx mov i, eax
    or more directly:
    lea edx, [i + 1] test condition, condition cmovnz i, edx ; conditionally move the incremented value
  2. SETcc (Set Byte on Condition) - Convert condition to 0/1
    xor eax, eax test condition, condition setnz al ; al = (condition != 0) ? 1 : 0 add i, eax ; i += 0 or 1
    This is often the most efficient for your exact use case - it turns the condition into a 0 or 1, then unconditionally adds it.
  3. ADC/SBB (Add/Subtract with Carry) - If condition is in carry flag
    ; If your condition sets the carry flag: adc i, 0 ; i += carry flag

The SETcc + ADD pattern is typically what compilers generate for branchless if (cond) i++, and it's usually faster than a branch for unpredictable conditions.

Replies are listed 'Best First'.
Re^2: Weird performance issue with Strawberries and Inline::C
by tonyc (Hermit) on Oct 12, 2025 at 21:14 UTC

    Strawberryperl 5.32 used gcc 8.3.0 while 5.42.0.1 uses gcc 13.2.0 which probably accounts for the difference between versions.

    From looking at godbolt gcc 8.3.0 optimizes that branch into the conditional expression while 13.2.0 doesn't

      From looking at godbolt gcc 8.3.0 optimizes that branch into the conditional expression while 13.2.0 doesn't

      Going back through my own builds of perl on Windows, I can see that the rot goes back as far as gcc-10.
      I don't know if gcc-9 was afflicted with this same issue as I don't have a perl that was built with gcc-9.

      Incidentally, things are much better with perl-5.42.0 built using Microsoft's Visual Studio 2022:
      v5.42.0 String length: 100000 Rate c4 c3 c4 13558/s -- -8% c3 14740/s 9% --
      (Well done, them ;-)

      BTW, the OP will probably be able to use gcc-8.3.0 to build the script, and reap the benefits of the better optimization capabilities provided by gcc-8.3.0.
      I inserted the following just prior to the "use Inline C => << 'END_OF_C';" in the OP's script.
      use Inline C => Config => # Force recompilation FORCE_BUILD => 1, # Set CC to to path to gcc.exe version 8.3.0 CC => 'C:/sp/_64/sp-5.32.0/c/bin/gcc.exe', # View build output BUILD_NOISY => 1, ;
      That worked fine for me on my build of perl-5.42.0, using gcc-15.1.0:
      v5.42.0 String length: 100000 Rate c3 c4 c3 14460/s -- -0% c4 14531/s 0% --
      Without that modification, the output was:
      v5.42.0 String length: 100000 Rate c4 c3 c4 3528/s -- -76% c3 14777/s 319% --
      UPDATE:
      I tried that same hack of using gcc.exe version 8.3.0 with current blead (built using gcc-15.2.0) and it failed with:
      try2_pl_62b5.c: loadable library and perl binaries are mismatched (got + first handshake key 0000000012e00080, needed 0000000012d00080)
      Looks like us hackers have now been deprived of yet another liberty.
      It's a bit more fickle than I thought. The hacked script works fine for my own build of perl-5.42.0, but not for Strawberry's build of perl-5.42.0. (There are small differences between those two builds of 5.42.0, but it would make better sense to me if it was the other way around. Anyway .... whilst I find this to be tantalizingly interesting, it's not massively important.)
      (If it's going to crash, I would prefer that they let it just do that - rather than forbid something simply because the practice is deemed to be dubious.)

      Cheers,
      Rob
      Wow that's a cool website! Never seen that before.
Re^2: Weird performance issue with Strawberries and Inline::C
by ikegami (Patriarch) on Oct 13, 2025 at 00:35 UTC

    You explained that c4 is slower than c3, but it wasn't in 5.32. The question isn't why c4 is slow; it's why it's slow now and not before.

      I was just making the point that C4 is naturally expected to be slower, to emphasize that you should write it like C3 if you don't want to rely on compiler optimization voodoo. It's nice when compiler optimizations work, but better to not rely on them. And yes I didn't do any of the effort to track down which optimization was lost and whether Perl's tooling was responsible.