in reply to Re: Is there something faster than string concatenation?
in thread Is there something faster than string concatenation?

Ok, that much makes sense. But I'd think I'd be running into a contiguous memory block problem. I mean, if this were C, I could realloc the buffer that has the first string in it, and if I'm lucky it wouldn't have to move, then I can just copy in the second string.

Over the course of doing this, every once in a while your realloc is going to need to move/copy. It seems like that should bite you over the long run.

But perhaps the open scalar ref basically just fronts for string concatenation? The usual approach for like a memory-based buffer from other languages is to grow the buffer larger than I need it for this one copy, so future copies won't have to risk the buffer moving.

Maybe my test is a little too perfect. Nothing else is competing for memory, so the string easily grows without bumping into anything? Of course, the only time this matters is in a tight loop iterating over a large dataset, where I shouldn't really be doing any extraneous allocations or anything anyway.

  • Comment on Re^2: Is there something faster than string concatenation?

Replies are listed 'Best First'.
Re^3: Is there something faster than string concatenation?
by Joost (Canon) on Dec 03, 2007 at 02:49 UTC
    Over the course of doing this, every once in a while your realloc is going to need to move/copy. It seems like that should bite you over the long run.
    Probably. If that happens, the array push/join mechanism may be the fastest. I don't know how string filehandles work exactly, but I bet they do indeed just front for concatenation.

    As you can see, though, usually string concat is the fastest solution. I wouldn't worry about it too much unless you're dealing with *really* huge strings, and by that time may want to move to on-disk files anyway, since you're probably taking up a fairly significant chunk of system memory.

Re^3: Is there something faster than string concatenation? (hermit crabs)
by tye (Sage) on Dec 03, 2007 at 07:07 UTC

    The size of the string buffer is doubled when it needs to be increased. So even if it has to be copied every time it is resized, that averages out to the equivalent of just one extra copying of the final string. So it isn't a big penalty.

    Update: What is true of arrays need not be true of strings, it appears. Thanks, ikegami.

    - tye        

      While trying to post a snippet that would demonstrate the doubling of the buffer size, I found that I can only account for an addition of at most 3 bytes more than needed.

      use Devel::Size qw( size ); my $s = ''; for (0..1024*1024) { printf("%2d %3d\n", length($s), size($s)); $s .= 'a'; }
      ... 1048563 1048588 1048564 1048592 1048565 1048592 1048566 1048592 1048567 1048592 1048568 1048596 1048569 1048596 1048570 1048596 1048571 1048596 1048572 1048600 1048573 1048600 1048574 1048600 1048575 1048600 1048576 1048604

      It allocates the nearest divisible by 4 amount.
      Similar results when appending more than one byte.
      Devel::Peek concurs.
      ActivePerl 5.8.8 on WinXP.

        I see the same result on Linux. However, stepping through the C code in the debugger indicates it passes repeatedly through Perl_pp_concat (pp_hot.c) and Perl_sv_grow (sv.c) and in Perl_sv_grow() it calls realloc here:

        if (newlen > SvLEN(sv)) { /* need more room? */ newlen = PERL_STRLEN_ROUNDUP(newlen); if (SvLEN(sv) && s) { #ifdef MYMALLOC const STRLEN l = malloced_size((void*)SvPVX_const(sv)); if (newlen <= l) { SvLEN_set(sv, l); return s; } else #endif s = saferealloc(s, newlen);
        So it seems the performance of Perl string concatenation depends on the performance of the underlying C library realloc function. Now, I thought most reallocs would double the string size (for the reasons pointed out earlier by tye) so I'm a bit confused about the Devel::Size results. Haven't resorted to studying the glibc realloc code yet.