in reply to Re^2: Is there something faster than string concatenation?
in thread Is there something faster than string concatenation?

The size of the string buffer is doubled when it needs to be increased. So even if it has to be copied every time it is resized, that averages out to the equivalent of just one extra copying of the final string. So it isn't a big penalty.

Update: What is true of arrays need not be true of strings, it appears. Thanks, ikegami.

- tye        

  • Comment on Re^3: Is there something faster than string concatenation? (hermit crabs)

Replies are listed 'Best First'.
Re^4: Is there something faster than string concatenation? (hermit crabs)
by ikegami (Patriarch) on Dec 03, 2007 at 07:37 UTC

    While trying to post a snippet that would demonstrate the doubling of the buffer size, I found that I can only account for an addition of at most 3 bytes more than needed.

    use Devel::Size qw( size ); my $s = ''; for (0..1024*1024) { printf("%2d %3d\n", length($s), size($s)); $s .= 'a'; }
    ... 1048563 1048588 1048564 1048592 1048565 1048592 1048566 1048592 1048567 1048592 1048568 1048596 1048569 1048596 1048570 1048596 1048571 1048596 1048572 1048600 1048573 1048600 1048574 1048600 1048575 1048600 1048576 1048604

    It allocates the nearest divisible by 4 amount.
    Similar results when appending more than one byte.
    Devel::Peek concurs.
    ActivePerl 5.8.8 on WinXP.

      I see the same result on Linux. However, stepping through the C code in the debugger indicates it passes repeatedly through Perl_pp_concat (pp_hot.c) and Perl_sv_grow (sv.c) and in Perl_sv_grow() it calls realloc here:

      if (newlen > SvLEN(sv)) { /* need more room? */ newlen = PERL_STRLEN_ROUNDUP(newlen); if (SvLEN(sv) && s) { #ifdef MYMALLOC const STRLEN l = malloced_size((void*)SvPVX_const(sv)); if (newlen <= l) { SvLEN_set(sv, l); return s; } else #endif s = saferealloc(s, newlen);
      So it seems the performance of Perl string concatenation depends on the performance of the underlying C library realloc function. Now, I thought most reallocs would double the string size (for the reasons pointed out earlier by tye) so I'm a bit confused about the Devel::Size results. Haven't resorted to studying the glibc realloc code yet.

        If realloc() doubles (at least) the size when a size increase is required, then the calling code would not know about it and so Devel::Size wouldn't know about it either.

        I've not heard of such a feature of realloc() described that way. But I have seen malloc() libraries that build arenas that only hold buffers of size 2**n with a bitmap to note which buffers are in-use. And such a scheme would have that impact. Certainly Win32 does not use such a scheme (it has a pretty naive implementation of malloc() which can certainly be a source of problems that need to be worked around). And I don't think Win32 Perl is usually built to use Perl's private malloc().

        So building up a string by repeated concat in Win32 Perl is probably going to copy the source string over and over. While some Perls with better malloc() implementations will only rarely have to copy the source string as I described above.

        You could use repeated trials, noting how often the string's address changes in order to make a good guess whether such a malloc() is being used. To get the string's address you can use something very similar to:

        my $fmt= "LIJ"; while( 1 ) { last if length( pack substr($fmt,0,1), 0 ) == length( pack "p", "" + ); $fmt= substr( $fmt, 1 ); die "No unpack format for integers the same size as pointers" if ! $fmt; } $fmt= substr( $fmt, 0, 1 ); my $addr= unpack $fmt, pack "p", $string;

        Thanks for mentioning that, eyepopslikeamosquito.

        - tye