in reply to Re^4: Is there something faster than string concatenation? (hermit crabs)
in thread Is there something faster than string concatenation?

I see the same result on Linux. However, stepping through the C code in the debugger indicates it passes repeatedly through Perl_pp_concat (pp_hot.c) and Perl_sv_grow (sv.c) and in Perl_sv_grow() it calls realloc here:

if (newlen > SvLEN(sv)) { /* need more room? */ newlen = PERL_STRLEN_ROUNDUP(newlen); if (SvLEN(sv) && s) { #ifdef MYMALLOC const STRLEN l = malloced_size((void*)SvPVX_const(sv)); if (newlen <= l) { SvLEN_set(sv, l); return s; } else #endif s = saferealloc(s, newlen);
So it seems the performance of Perl string concatenation depends on the performance of the underlying C library realloc function. Now, I thought most reallocs would double the string size (for the reasons pointed out earlier by tye) so I'm a bit confused about the Devel::Size results. Haven't resorted to studying the glibc realloc code yet.

  • Comment on Re^5: Is there something faster than string concatenation? (hermit crabs)
  • Download Code

Replies are listed 'Best First'.
Re^6: Is there something faster than string concatenation? (quantum shells)
by tye (Sage) on Dec 03, 2007 at 19:21 UTC

    If realloc() doubles (at least) the size when a size increase is required, then the calling code would not know about it and so Devel::Size wouldn't know about it either.

    I've not heard of such a feature of realloc() described that way. But I have seen malloc() libraries that build arenas that only hold buffers of size 2**n with a bitmap to note which buffers are in-use. And such a scheme would have that impact. Certainly Win32 does not use such a scheme (it has a pretty naive implementation of malloc() which can certainly be a source of problems that need to be worked around). And I don't think Win32 Perl is usually built to use Perl's private malloc().

    So building up a string by repeated concat in Win32 Perl is probably going to copy the source string over and over. While some Perls with better malloc() implementations will only rarely have to copy the source string as I described above.

    You could use repeated trials, noting how often the string's address changes in order to make a good guess whether such a malloc() is being used. To get the string's address you can use something very similar to:

    my $fmt= "LIJ"; while( 1 ) { last if length( pack substr($fmt,0,1), 0 ) == length( pack "p", "" + ); $fmt= substr( $fmt, 1 ); die "No unpack format for integers the same size as pointers" if ! $fmt; } $fmt= substr( $fmt, 0, 1 ); my $addr= unpack $fmt, pack "p", $string;

    Thanks for mentioning that, eyepopslikeamosquito.

    - tye        

Re^6: Is there something faster than string concatenation? (hermit crabs)
by Joost (Canon) on Dec 03, 2007 at 12:59 UTC