Putting Perl aside for a moment, there are approximately four ways to grow a string:

The last one is what Perl does. By allocating extra space, the "allocate and copy" inefficiencies are mitigated. In brief searching I haven't found the definitive source on how Perl approaches this, but my recollection is this:

  1. For strings under 256 bytes, allocate a 256 byte buffer.
  2. When a string grows to exceed buffer space, jump to 1kb and copy into new buffer.
  3. When a string grows to exceed the 1kb, allocate 2kb, and copy everything over.
  4. Repeat, doubling each time.

Moving contiguous blocks of memory around is a pretty quick operation for most processors. The key is to avoid doing it too often. The other key is to not waste too much memory by allocating way too much extra. Just where Perl sets the transition points I don't recall exactly.

The process does grow in time complexity as the string's growth continues doubling. But it doesn't (ideally) happen on each iteration of growth. I believe that people suggest that concatenation occurs in amortized constant time. In other words, any given iteration may be expensive, but the bulk of the iterations will be cheap, to the point that it's as good as constant time in the aggregate. The same principle applies to hash insertions; constant amortized time, even though an individual insertion may be relatively expensive.

Ruby probably uses the same strategy as Perl. There really isn't any better strategy. The generalized form of the algorithm also applies to Perl's arrays, C++'s std::vector container, C++'s std::string data type, and so on. It just keeps popping up because it's so useful. Details of threshold points and other optimizations may change, but the underlying algorithm is mostly the same.


Dave


In reply to Re: non schlemiel painter string concat by davido
in thread non schlemiel painter string concat by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.