pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to benchmark string concatenation. I am trying to compare string interpolation to string concatenation when they are used to do the same thing. My suspicion is that non-interpolated concatenation beats interpolation, and I have been told that that is true. So I wrote this to test the claim:

#! /usr/bin/perl -w use strict; use Benchmark qw(:all); my $sample = 'abcdefghijklmnopqrstuvwxyz'; sub interpolated () { my $s = ''; for (0 .. 99) {$s = "$s$sample"} } sub noninterpolated () { my $s = ''; for (0 .. 99) {$s .= $sample} } cmpthese (-1, {interp => "interpolated()", nonint => "noninterpolated()"});
Here are the results I get:
Rate nonint interp nonint 13917/s -- -0% interp 13964/s 0% --
Now these results show practically identical numbers, and I believe that they should be more disparate and that my tests are inadequate. So my question is, what can I do to improve my tests? Should I leave the iteration to the cmpthese routine? Am I testing in a way that gets optimized down to identical code?

Thank you all.

Replies are listed 'Best First'.
Re: String Concatenation Performance
by chromatic (Archbishop) on Mar 20, 2004 at 17:05 UTC

    Per my reading of the opcodes, you'll spend a lot of time looking for something that's not there. There are two extra ops in the interpolation case, but they're optimized away.

    $ perl -MO=Concise my $s = 'foo'; my $string = "${s}bar"; c <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -:1) v ->3 5 <2> sassign vKS/2 ->6 3 <$> const(PV "foo") s ->4 4 <0> padsv[$s:1,3] sRM*/LVINTRO ->5 6 <;> nextstate(main 2 -:2) v ->7 b <2> sassign vKS/2 ->c - <1> ex-stringify sK/1 ->a - <0> ex-pushmark s ->7 9 <2> concat[t3] sK/2 ->a 7 <0> padsv[$s:1,3] s ->8 8 <$> const(PV "bar") s ->9 a <0> padsv[$string:2,3] sRM*/LVINTRO ->b - syntax OK $ perl -MO=Concise my $s = 'foo'; my $string = $s . 'bar'; c <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -:1) v ->3 5 <2> sassign vKS/2 ->6 3 <$> const(PV "foo") s ->4 4 <0> padsv[$s:1,3] sRM*/LVINTRO ->5 6 <;> nextstate(main 2 -:2) v ->7 b <2> sassign vKS/2 ->c 9 <2> concat[t3] sK/2 ->a 7 <0> padsv[$s:1,3] s ->8 8 <$> const(PV "bar") s ->9 a <0> padsv[$string:2,3] sRM*/LVINTRO ->b - syntax OK

      Thank you - this is excellent. You have confirmed my suspicion that my tests were not good. Now I need to find a test that tricks the optimizer. Given that the ex-stringify sK/1 ->a op was removed, I guess using constant strings is not the way to test.

      I always understood that 'abc' was better than "abc", because of the interpolation, but it seems that the optimizer sees through this, given the constant strings, so all I am doing is being kinder to the optimizer. Premature optimization.

      The following now uses a string read from <>, and therefore not a constant:

      % perl -MO=Concise my $s=<>; my $t="$s"; b <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -:1) v ->3 - <1> null vKS/2 ->6 3 <0> padsv[$s:1,3] sRM*/LVINTRO ->4 5 <1> readline[t3] sKS/1 ->6 4 <#> gv[*ARGV] s ->5 6 <;> nextstate(main 2 -:2) v ->7 a <2> sassign vKS/2 ->b 8 <@> stringify[t5] sK/1 ->9 - <0> ex-pushmark s ->7 7 <0> padsv[$s:1,3] s ->8 9 <0> padsv[$t:2,3] sRM*/LVINTRO ->a
      I'll recreate tests based on this. Thank you all.

        The following now uses a string read from <>, and therefore not a constant:
        Your assumptions about what happens why are incorrect.
        $ perl -MO=Concise -e'my $s=""; my $t="$s";' b <@> leave&#91;$s:1,3] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 5 <2> sassign vKS/2 ->6 3 <$> const(PV "") s ->4 4 <0> padsv[$s:1,3] sRM*/LVINTRO ->5 6 <;> nextstate(main 2 -e:1) v ->7 a <2> sassign vKS/2 ->b 8 <@> stringify[t3] sK/1 ->9 - <0> ex-pushmark s ->7 7 <0> padsv[$s:1,3] s ->8 9 <0> padsv[$t:2,3] sRM*/LVINTRO ->a -e syntax OK
        The "stringify" op is still active.

        Makeshifts last the longest.

Re: String Concatenation Performance
by TomDLux (Vicar) on Mar 20, 2004 at 15:49 UTC

    I get a 3% difference, even taking the iterations up to a million times:

    perl t Rate interp nonint interp 12585/s -- -3% nonint 12928/s 3% --

    How does it vary with longer string interpolations?

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      When I bump the loops to a million reps I get:

      (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate nonint interp nonint 1.39/s -- -1% interp 1.41/s 1% --

        I humbly suggest that you only thought you were getting a million reps. Since the output says you were getting 1.4 reps per second, on average, the two million reps ( nonint + interp ) should take 1428571 seconds, which is over 16 days. So, unless you've invented a time machine, to get the report back to us so quickly ....

        When I say a million reps, I mean replacing the '-1' iteration count with a specific value:

        cmpthese (1_000_000, {interp => "interpolated()", nonint => "noninterpolated()"});

        --
        TTTATCGGTCGTTATATAGATGTTTGCA

Re: String Concatenation Performance
by Aristotle (Chancellor) on Mar 20, 2004 at 18:26 UTC
    Somehow, I can't imagine your code is doing nothing else than concatenating strings in a tight loop. Even if so, you're already benchmarking close to 1,400,000 concats/sec. Are you sure you should care about the relative performance of interpolation and concatenation, and not be looking for another algorithm or writing C code instead?

    Makeshifts last the longest.

      Oh I agree completely - there is not much to be gained here, but in general, I still want to know the degree to which interpolation is slower. My problem is all about isolating test cases that show this. For example, the following are all going to perform slightly differently, and I just want to understand:

      print "$a$b$c\n"; print $a, $b, $c, "\n";
      or how about:
      my $s = $a . $b . $c . "\n"; my $s = "$a$b$c\n";
        Your first pair can be significantly different, depending on the relative performance of concatenation vs your I/O system. However, your second pair should always be identical, statistically speaking. They always compile down to identical opcodes, as chromatic pointed out. There's no point in comparing them to see which one is faster, because you're pretty much guaranteed not to get a statistically signicant difference, and if you ever do, you didn't. :-)