Re: non schlemiel painter string concat

Putting Perl aside for a moment, there are approximately four ways to grow a string:

Allocate more space than anyone could ever want, and hope for the best. Null-terminate, or keep a pointer to the end. This is fragile at best, and is the source of security problems caused by buffer overruns. Perl doesn't do this.
Allocate just enough space. If the string grows, allocate a new buffer and move everything to the larger buffer. This becomes quite inefficient, and is probably the "schmliel the painter" algorithm to which you refer.
Allocate just enough space and store a pointer to it. Concatenation becomes the logical operation of allocating just enough space again, and adding another pointer to the list of discontiguous memory blocks. But discontiguous storage is complicated, and doesn't benefit from the efficiencies of accessing contiguous blocks of memory.
Allocate a little extra space so that the string can grow a little without the "allocate and copy" dance. Every time the string's growth runs out of buffer space, allocate a new buffer with room for the concatenated string, plus extra space.

The last one is what Perl does. By allocating extra space, the "allocate and copy" inefficiencies are mitigated. In brief searching I haven't found the definitive source on how Perl approaches this, but my recollection is this:

For strings under 256 bytes, allocate a 256 byte buffer.
When a string grows to exceed buffer space, jump to 1kb and copy into new buffer.
When a string grows to exceed the 1kb, allocate 2kb, and copy everything over.
Repeat, doubling each time.

Moving contiguous blocks of memory around is a pretty quick operation for most processors. The key is to avoid doing it too often. The other key is to not waste too much memory by allocating way too much extra. Just where Perl sets the transition points I don't recall exactly.

The process does grow in time complexity as the string's growth continues doubling. But it doesn't (ideally) happen on each iteration of growth. I believe that people suggest that concatenation occurs in amortized constant time. In other words, any given iteration may be expensive, but the bulk of the iterations will be cheap, to the point that it's as good as constant time in the aggregate. The same principle applies to hash insertions; constant amortized time, even though an individual insertion may be relatively expensive.

Ruby probably uses the same strategy as Perl. There really isn't any better strategy. The generalized form of the algorithm also applies to Perl's arrays, C++'s std::vector container, C++'s std::string data type, and so on. It just keeps popping up because it's so useful. Details of threshold points and other optimizations may change, but the underlying algorithm is mostly the same.

Dave

Comment on Re: non schlemiel painter string concat Select or Download Code