Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing some code that manipulates strings and that should be fast. Several functions look like this:

$string_out = foo($string_in);

While benchmarking my code to speed it up, I've noticed that quite some time was wasted in string copies. Using references does improve speed.

Here is what I tried:

use strict; use warnings; use Benchmark qw(:all); use constant COUNT => 200000; my($str, $tmp, $len); sub in1 ($) { $tmp = length($_[0]); } sub in2 ($) { $tmp = length(${$_[0]}); } sub out1 () { return($str); } sub out2 () { return(\$str); } sub fun1 ($) { my $new = $_[0] . "x"; return($new); } sub fun2 ($) { my $new = ${ $_[0] } . "x"; return(\$new); } foreach $len (1, 10, 100) { $str = "A" x ($len * 1000); timethese(10, { "in1x$len" => sub { for (1 .. COUNT) { in1($str) } }, "in2x$len" => sub { for (1 .. COUNT) { in2(\$str) } }, "out1x$len" => sub { for (1 .. COUNT) { $tmp = out1() } }, "out2x$len" => sub { for (1 .. COUNT) { $tmp = out2() } }, "fun1x$len" => sub { for (1 .. COUNT) { $tmp = fun1($str) } }, "fun2x$len" => sub { for (1 .. COUNT) { $tmp = fun2(\$str) } }, }); }
Here are the results:

Benchmark: timing 10 iterations of fun1x1, fun2x1, in1x1, in2x1, out1x +1, out2x1... fun1x1: 4 wallclock secs ( 4.31 usr + 0.01 sys = 4.32 CPU) @ 2 +.31/s (n=10) fun2x1: 5 wallclock secs ( 4.92 usr + 0.00 sys = 4.92 CPU) @ 2 +.03/s (n=10) in1x1: 2 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9 +.26/s (n=10) in2x1: 1 wallclock secs ( 1.48 usr + 0.00 sys = 1.48 CPU) @ 6 +.76/s (n=10) out1x1: 3 wallclock secs ( 2.40 usr + 0.00 sys = 2.40 CPU) @ 4 +.17/s (n=10) out2x1: 1 wallclock secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 6 +.33/s (n=10) Benchmark: timing 10 iterations of fun1x10, fun2x10, in1x10, in2x10, o +ut1x10, out2x10... fun1x10: 16 wallclock secs (15.72 usr + 0.04 sys = 15.76 CPU) @ 0 +.63/s (n=10) fun2x10: 12 wallclock secs (12.33 usr + 0.01 sys = 12.34 CPU) @ 0 +.81/s (n=10) in1x10: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9 +.26/s (n=10) in2x10: 2 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU) @ 6 +.80/s (n=10) out1x10: 6 wallclock secs ( 6.62 usr + 0.00 sys = 6.62 CPU) @ 1 +.51/s (n=10) out2x10: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 6 +.29/s (n=10) Benchmark: timing 10 iterations of fun1x100, fun2x100, in1x100, in2x10 +0, out1x100, out2x100... fun1x100: 119 wallclock secs (118.82 usr + 0.03 sys = 118.85 CPU) @ + 0.08/s (n=10) fun2x100: 89 wallclock secs (87.76 usr + 0.05 sys = 87.81 CPU) @ 0 +.11/s (n=10) in1x100: 1 wallclock secs ( 1.10 usr + 0.00 sys = 1.10 CPU) @ 9 +.09/s (n=10) in2x100: 1 wallclock secs ( 1.49 usr + 0.01 sys = 1.50 CPU) @ 6 +.67/s (n=10) out1x100: 43 wallclock secs (43.27 usr + 0.04 sys = 43.31 CPU) @ 0 +.23/s (n=10) out2x100: 2 wallclock secs ( 1.57 usr + 0.00 sys = 1.57 CPU) @ 6 +.37/s (n=10)

As you can see, passing a string by reference seems to slow down (in2 is slower than in1) while returning it by reference (out2 versus out1) gives a big boost, especially with big strings. When combining both, most of the time is wasted in the string modification but the version by reference (fun2) is significantly faster than the direct one (fun1).

I can choose the API I want for my code but I also use other modules and they do not seem to allow ways to avoid string copies. The modules I use are Encode, MIME::Base64 and Compress::Zlib. The first two only work on strings while the last one does accept a reference as input but does not allow to get a reference to the output.

Hence my questions:

Replies are listed 'Best First'.
Re: How to avoid string copies in function calls?
by davido (Cardinal) on Nov 09, 2011 at 07:33 UTC

    "Are there other ways to avoid string copies besides using references?"

    There are the ugly ones, like storing the string in a global variable and not passing it around at all; just acting upon the global instance of it.

    The reference approach is probably a better solution.

    "Does it make sense to submit enhancement requests to these standard modules to have the possibility to get the result by reference?"

    It makes more sense to submit patches that implement your request in a way that doesn't forcibly export anything additional, and doesn't cause the modules to fail their existing test suite. Additionally, you would want to include a full set of tests for the patch you submitted, along with a POD patch. In other words, you're providing the added functionality, not wrecking code that uses the module (not exporting anything new by default), not wrecking the module itself (it still passes its test suite), and providing your own proof that your patches work, and full documentation.

    If that doesn't get anyone's attention, subclass the module in question, or for non-OO modules, write a wrapper module that adds your functionality while passing through all the old functionality too. Upload to CPAN.


    Dave

Re: How to avoid string copies in function calls?
by ikegami (Patriarch) on Nov 09, 2011 at 07:46 UTC

    Your tests are bad. The only copying is happening outside of the functions you are testing. Specifically, they occurring when you assign the string to $tmp in $tmp = out1() and in $tmp = fun1($str).

    Right now, you have

    my $str = out1(); my $ref = out2(); my $str = fun1($str); my $ref = fun2(\$str);

    If you wanted to fix the test, you'd use

    my $str = out1(); my $str = ${ out2() }; my $str = fun1($str); my $str = ${ fun2(\$str) };

    or

    my $ref = \out1(); my $ref = out2(); my $ref = \fun1($str); my $ref = fun2(\$str);

    As you can see, passing a string by reference seems to slow down (in2 is slower than in1)

    Perl always passes by reference. in2 passes *a* reference (by reference). This reference has to be created, dereferenced and destroyed. This is all work that in2 does on top of what in1 does, so of course it will be slower.

    Update: The last paragraph was saying "out1" and "out2" when I meant "in1" and "in2". Fixed.

      Thanks for your comments but further tests do show a difference. Here is what I've used:
      "out1ax$len" => sub { for (1 .. COUNT) { $tmp = out1() } }, "out1bx$len" => sub { for (1 .. COUNT) { $tmp = \out1() } }, "out2ax$len" => sub { for (1 .. COUNT) { $tmp = ${ out2() } } }, "out2bx$len" => sub { for (1 .. COUNT) { $tmp = out2() } },
      And here is what it gives:
       out1ax100: 44 wallclock secs (43.20 usr +  0.01 sys = 43.21 CPU) @  0.23/s (n=10)
       out1bx100: 42 wallclock secs (42.03 usr +  0.01 sys = 42.04 CPU) @  0.24/s (n=10)
       out2ax100: 41 wallclock secs (40.59 usr +  0.00 sys = 40.59 CPU) @  0.25/s (n=10)
       out2bx100:  2 wallclock secs ( 1.61 usr +  0.00 sys =  1.61 CPU) @  6.21/s (n=10)
      
      So out2b (real return by reference) is much faster than out1b (take the reference of the result). So string copy does take place when Perl handles the return value of the function. Or did I miss something?

        Perfect. This is what I was expecting to see. Yes, non-lvalue functions copy their result.

        $ perl -MDevel::Peek -e'Dump sub { my $x = "abc"; Dump $x; $x }->()' SV = PV(0x8bfe730) at 0x8c1b2b0 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x8c16088 "abc"\0 <-- Note the address CUR = 3 LEN = 12 SV = PV(0x8bfe780) at 0x8c00768 REFCNT = 1 FLAGS = (TEMP,POK,pPOK) PV = 0x8c1e2e0 "abc"\0 <-- A copy CUR = 3 LEN = 12

        That's why returning a string ($ref = \out1()) is as expensive as assigning a string ($str = ${ out2() }).

        So why is both returning a string and assigning it not ($str = out1()) twice as expensive?

        When assigning a string slated to be destroyed* such as the automatic copy of a return value, the assignment will steal the buffer instead of copying it.

        * — (FLAGS & TEMP) && REFCNT == 1, I think.

Re: How to avoid string copies in function calls?
by Anonymous Monk on Nov 09, 2011 at 07:31 UTC

    Are there other ways to avoid string copies besides using references?

    No. Well there are aliases ( ex for my $alias ( $foo, @_ ){ } ) but they're essentially references

    Does it make sense to submit enhancement requests to these standard modules to have the possibility to get the result by reference?

    Sure, as long as you don't offer your benchmark with the suggestion, because it is not Coping with Scoping