in reply to Is there something faster than string concatenation?

The main reason Java string concatenation is slow is that java strings are immutable. That is, every concatenation operation will create a new string and copy the text of the two input strings.

Perl's strings aren't immutable so if you do:

$some_long_string .= $some_shorter_string;
Only the contents of $some_shorter_string needs to be copied.

String file handles have all kinds of overhead and they're more a convenience - I'm surprised the non-oo version is that quick. Object oriented calls are slow in perl anyway relative to function calls.

The join version is pretty slow though. I wouldn't have expected that.

Replies are listed 'Best First'.
Re^2: Is there something faster than string concatenation?
by rdj (Novice) on Dec 03, 2007 at 02:28 UTC

    Ok, that much makes sense. But I'd think I'd be running into a contiguous memory block problem. I mean, if this were C, I could realloc the buffer that has the first string in it, and if I'm lucky it wouldn't have to move, then I can just copy in the second string.

    Over the course of doing this, every once in a while your realloc is going to need to move/copy. It seems like that should bite you over the long run.

    But perhaps the open scalar ref basically just fronts for string concatenation? The usual approach for like a memory-based buffer from other languages is to grow the buffer larger than I need it for this one copy, so future copies won't have to risk the buffer moving.

    Maybe my test is a little too perfect. Nothing else is competing for memory, so the string easily grows without bumping into anything? Of course, the only time this matters is in a tight loop iterating over a large dataset, where I shouldn't really be doing any extraneous allocations or anything anyway.

      Over the course of doing this, every once in a while your realloc is going to need to move/copy. It seems like that should bite you over the long run.
      Probably. If that happens, the array push/join mechanism may be the fastest. I don't know how string filehandles work exactly, but I bet they do indeed just front for concatenation.

      As you can see, though, usually string concat is the fastest solution. I wouldn't worry about it too much unless you're dealing with *really* huge strings, and by that time may want to move to on-disk files anyway, since you're probably taking up a fairly significant chunk of system memory.

      The size of the string buffer is doubled when it needs to be increased. So even if it has to be copied every time it is resized, that averages out to the equivalent of just one extra copying of the final string. So it isn't a big penalty.

      Update: What is true of arrays need not be true of strings, it appears. Thanks, ikegami.

      - tye        

        While trying to post a snippet that would demonstrate the doubling of the buffer size, I found that I can only account for an addition of at most 3 bytes more than needed.

        use Devel::Size qw( size ); my $s = ''; for (0..1024*1024) { printf("%2d %3d\n", length($s), size($s)); $s .= 'a'; }
        ... 1048563 1048588 1048564 1048592 1048565 1048592 1048566 1048592 1048567 1048592 1048568 1048596 1048569 1048596 1048570 1048596 1048571 1048596 1048572 1048600 1048573 1048600 1048574 1048600 1048575 1048600 1048576 1048604

        It allocates the nearest divisible by 4 amount.
        Similar results when appending more than one byte.
        Devel::Peek concurs.
        ActivePerl 5.8.8 on WinXP.