Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

append to a string or make an array? (memory and time efficiency)

by swkronenfeld (Hermit)
on Jul 25, 2003 at 03:22 UTC ( #277781=perlquestion: print w/replies, xml ) Need Help??

swkronenfeld has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing an application that, when the user requests something, I prepare some output for them, and return it to them.

I wrote (or rather, adapted an existing) server skeleton, which accepts new connections, and creates input and output buffers for each new connection. Currently the output buffer for each user is a single string.

In my server, I get the users input, and send it on to another file to prepare the appropriate output. That file generates the output and returns it to the server, which places it in the outgoing buffer. All pretty standard, right?

So my question: the output for each request is generated in chunks, sometimes as large as a few kilos per chunk. Each request tends to have at least 2 chunks generated, and as many as 5. Currently, to add the new chunk to the existing output, I do

$outBuffer .= $newChunk

Is this copying of strings incredibly inefficient, as it seems to me that it might be? Would I be better off pushing each chunk of output onto an outgoing array. I assume that an array would be more memory inefficient, but speed is more important. Are there other issues/solutions I haven't thought of?

(It's a pretty significant change to my program, so I tried running the benchmark module on a small example I contrived. Alas, my company doesn't have perl configured correctly to run a lot of modules, benchmark being one of those)

  • Comment on append to a string or make an array? (memory and time efficiency)

Replies are listed 'Best First'.
Re: append to a string or make an array? (memory and time efficiency)
by BrowserUk (Patriarch) on Jul 25, 2003 at 03:57 UTC

    I'm not sure that this is exactly a great benchmark, but it seems fairly clear that appending to a string is a generally a bit quicker than pushing to an array.

    P:\test>test Rate push1k append1k push1k 4816/s -- -26% append1k 6472/s 34% -- Rate push8k append8k push8k 982/s -- -9% append8k 1078/s 10% -- Rate push64k append64k push64k 43.4/s -- -33% append64k 65.3/s 50% --

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      I think your benchmark is flawed. I don't think your append code is doing what you think it's doing. Witness:
      my $buffer .= "x" x 1024 for 0 .. 5; print length $buffer;

      $buffer is uninitialized by the time it reached the print. Moving the declaration away from the assignment fixes this.

      Furthermore, both fragments don't produce the same result. The 'push' variant keeps everything in an array - it ought to have final join, IMO.

      Here's a somewhat different benchmark. It also shows the append method to be faster, and much faster than I expected it to be.

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our $str; our ($a, $b); for our $t (5, 10, 100) { for my $n (1, 8, 64) { $str = "x" x ($n * 1024); cmpthese -3, { "append${n}k_$t" => '$::a = ""; $::a .= $::str for 1 .. $: +:t;', "push${n}k_$t" => 'my @s; push @s => $::str for 1 .. $: +:t; $::b = join "" => @s', }; die "Benchmark is wrong\n" unless length ($a) == $t * length $ +str && length ($b) == $t * length $ +str; } } __END__ Benchmark: running append1k_5, push1k_5 for at least 3 CPU seconds... Rate push1k_5 append1k_5 push1k_5 91330/s -- -58% append1k_5 219880/s 141% -- Benchmark: running append8k_5, push8k_5 for at least 3 CPU seconds... Rate push8k_5 append8k_5 push8k_5 33834/s -- -50% append8k_5 67846/s 101% -- Benchmark: running append64k_5, push64k_5 for at least 3 CPU seconds.. +. Rate push64k_5 append64k_5 push64k_5 460/s -- -92% append64k_5 5705/s 1139% -- Benchmark: running append1k_10, push1k_10 for at least 3 CPU seconds.. +. Rate push1k_10 append1k_10 push1k_10 53627/s -- -60% append1k_10 135692/s 153% -- Benchmark: running append8k_10, push8k_10 for at least 3 CPU seconds.. +. Rate push8k_10 append8k_10 push8k_10 16317/s -- -67% append8k_10 48795/s 199% -- Benchmark: running append64k_10, push64k_10 for at least 3 CPU seconds +... Rate push64k_10 append64k_10 push64k_10 221/s -- -85% append64k_10 1510/s 583% -- Benchmark: running append1k_100, push1k_100 for at least 3 CPU seconds +... Rate push1k_100 append1k_100 push1k_100 5730/s -- -60% append1k_100 14175/s 147% -- Benchmark: running append8k_100, push8k_100 for at least 3 CPU seconds +... Rate push8k_100 append8k_100 push8k_100 185/s -- -84% append8k_100 1181/s 537% -- Benchmark: running append64k_100, push64k_100 for at least 3 CPU secon +ds... Rate push64k_100 append64k_100 push64k_100 22.5/s -- -76% append64k_100 94.7/s 320% --

      Abigail

        Good catch++.

        I'm not sure I agree with you about the final join though. The OP just needs to keep the 1 to 5 "few k" chunks somewhere until he has the set. At that point he is going to transmit them.

        He can just as easily write them as 5 seperate strings as 1, and avoid the need to concatenate them?

        I think the most efficient way would be to pre-extend to the array to max_chanks, and then pass and store references to the chunks avoiding any need to copy or concatenate. In isolation from the code that generates the chunks, passes them to the buffering code, and finally transmits them, the whole thing is an academic exercise.

        Even without my gaff, the benchmark was at best iffy.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      I would guess any difference is about copying - the more you can do to prevent perl from having to copy strings around the better off you're likely to have. My nearly published ESPPlus::Storage is an example of that, in as many places as possible it avoids making copies and also is sure to pass references around instead of plain strings.

Re: append to a string or make an array? (memory and time efficiency)
by swkronenfeld (Hermit) on Jul 25, 2003 at 03:35 UTC
    I realized that I am perhaps being a bit vague?

    The application is a server. The user is connecting via TCP/IP, much like Telnet. Continuing the telnet analogy, some commands, like cat, produce all their output at once, while other commands, like find, produce their output a little at a time. Either way, the user doesn't have to press any more keys to get all the output, and I don't mind if they receive it all at once (as if the output of the find command was buffered, and sent in its entirety when the command completed)

    Have I clarified things or made it worse?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://277781]
Approved by fglock
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2022-09-29 17:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (125 votes). Check out past polls.

    Notices?