in reply to non schlemiel painter string concat

Putting Perl aside for a moment, there are approximately four ways to grow a string:

The last one is what Perl does. By allocating extra space, the "allocate and copy" inefficiencies are mitigated. In brief searching I haven't found the definitive source on how Perl approaches this, but my recollection is this:

  1. For strings under 256 bytes, allocate a 256 byte buffer.
  2. When a string grows to exceed buffer space, jump to 1kb and copy into new buffer.
  3. When a string grows to exceed the 1kb, allocate 2kb, and copy everything over.
  4. Repeat, doubling each time.

Moving contiguous blocks of memory around is a pretty quick operation for most processors. The key is to avoid doing it too often. The other key is to not waste too much memory by allocating way too much extra. Just where Perl sets the transition points I don't recall exactly.

The process does grow in time complexity as the string's growth continues doubling. But it doesn't (ideally) happen on each iteration of growth. I believe that people suggest that concatenation occurs in amortized constant time. In other words, any given iteration may be expensive, but the bulk of the iterations will be cheap, to the point that it's as good as constant time in the aggregate. The same principle applies to hash insertions; constant amortized time, even though an individual insertion may be relatively expensive.

Ruby probably uses the same strategy as Perl. There really isn't any better strategy. The generalized form of the algorithm also applies to Perl's arrays, C++'s std::vector container, C++'s std::string data type, and so on. It just keeps popping up because it's so useful. Details of threshold points and other optimizations may change, but the underlying algorithm is mostly the same.


Dave