I'm writing some code that manipulates strings and that should be fast. Several functions look like this:
$string_out = foo($string_in);
While benchmarking my code to speed it up, I've noticed that quite some time was wasted in string copies. Using references does improve speed.
Here is what I tried:
Here are the results:use strict; use warnings; use Benchmark qw(:all); use constant COUNT => 200000; my($str, $tmp, $len); sub in1 ($) { $tmp = length($_[0]); } sub in2 ($) { $tmp = length(${$_[0]}); } sub out1 () { return($str); } sub out2 () { return(\$str); } sub fun1 ($) { my $new = $_[0] . "x"; return($new); } sub fun2 ($) { my $new = ${ $_[0] } . "x"; return(\$new); } foreach $len (1, 10, 100) { $str = "A" x ($len * 1000); timethese(10, { "in1x$len" => sub { for (1 .. COUNT) { in1($str) } }, "in2x$len" => sub { for (1 .. COUNT) { in2(\$str) } }, "out1x$len" => sub { for (1 .. COUNT) { $tmp = out1() } }, "out2x$len" => sub { for (1 .. COUNT) { $tmp = out2() } }, "fun1x$len" => sub { for (1 .. COUNT) { $tmp = fun1($str) } }, "fun2x$len" => sub { for (1 .. COUNT) { $tmp = fun2(\$str) } }, }); }
Benchmark: timing 10 iterations of fun1x1, fun2x1, in1x1, in2x1, out1x +1, out2x1... fun1x1: 4 wallclock secs ( 4.31 usr + 0.01 sys = 4.32 CPU) @ 2 +.31/s (n=10) fun2x1: 5 wallclock secs ( 4.92 usr + 0.00 sys = 4.92 CPU) @ 2 +.03/s (n=10) in1x1: 2 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9 +.26/s (n=10) in2x1: 1 wallclock secs ( 1.48 usr + 0.00 sys = 1.48 CPU) @ 6 +.76/s (n=10) out1x1: 3 wallclock secs ( 2.40 usr + 0.00 sys = 2.40 CPU) @ 4 +.17/s (n=10) out2x1: 1 wallclock secs ( 1.58 usr + 0.00 sys = 1.58 CPU) @ 6 +.33/s (n=10) Benchmark: timing 10 iterations of fun1x10, fun2x10, in1x10, in2x10, o +ut1x10, out2x10... fun1x10: 16 wallclock secs (15.72 usr + 0.04 sys = 15.76 CPU) @ 0 +.63/s (n=10) fun2x10: 12 wallclock secs (12.33 usr + 0.01 sys = 12.34 CPU) @ 0 +.81/s (n=10) in1x10: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9 +.26/s (n=10) in2x10: 2 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU) @ 6 +.80/s (n=10) out1x10: 6 wallclock secs ( 6.62 usr + 0.00 sys = 6.62 CPU) @ 1 +.51/s (n=10) out2x10: 2 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU) @ 6 +.29/s (n=10) Benchmark: timing 10 iterations of fun1x100, fun2x100, in1x100, in2x10 +0, out1x100, out2x100... fun1x100: 119 wallclock secs (118.82 usr + 0.03 sys = 118.85 CPU) @ + 0.08/s (n=10) fun2x100: 89 wallclock secs (87.76 usr + 0.05 sys = 87.81 CPU) @ 0 +.11/s (n=10) in1x100: 1 wallclock secs ( 1.10 usr + 0.00 sys = 1.10 CPU) @ 9 +.09/s (n=10) in2x100: 1 wallclock secs ( 1.49 usr + 0.01 sys = 1.50 CPU) @ 6 +.67/s (n=10) out1x100: 43 wallclock secs (43.27 usr + 0.04 sys = 43.31 CPU) @ 0 +.23/s (n=10) out2x100: 2 wallclock secs ( 1.57 usr + 0.00 sys = 1.57 CPU) @ 6 +.37/s (n=10)
As you can see, passing a string by reference seems to slow down (in2 is slower than in1) while returning it by reference (out2 versus out1) gives a big boost, especially with big strings. When combining both, most of the time is wasted in the string modification but the version by reference (fun2) is significantly faster than the direct one (fun1).
I can choose the API I want for my code but I also use other modules and they do not seem to allow ways to avoid string copies. The modules I use are Encode, MIME::Base64 and Compress::Zlib. The first two only work on strings while the last one does accept a reference as input but does not allow to get a reference to the output.
Hence my questions:
In reply to How to avoid string copies in function calls? by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |