To see the source of 1/5 thing, look at the source to av.c. The newmax variable tracks Perl's buffering decisions quite precisely. That doesn't explain the consequences though. To understand the consequences, consider what happens just after you've had to reallocate. At that point all elements were reallocated once, 5/6 of them twice, 25/36 of them three times and so on. The total number of reallocations is therefore (1 + 5/6 + (5/6)**2 + ...). Multiply and divide by (1-5/6) then simplify to see that on average, each element has been reallocated 6 times. (Right before a reallocation that average drops to 5.)

The point about amortized constant is critical. The convergence to that behaviour is quite fast. For instance if you break the 400,000 array into building 400 arrays of 1000 elements that you then push together, you'll find the same amortized constant coming into play both in building 1000 element arrays and in building the 400,000 element array. So instead of eliminating that overhead, you're now paying twice the overhead! (By separating arrays you've kept Perl from deciding to allocate big enough chunks.)

If you want to reduce that overhead, you can change the constant fairly easily by pre-extending the array at specified intervals. Assign by index and keep track of the size of the array. Any time you're about to pass the maximum size of the array, assign undef to a spot that is twice as far. That'll change the constant from 5 to 2, and the average reallocations from ranging from 5-6 down to ranging from 2-3. (At the cost of a constant amount of logic per iteration. I don't know which constant will be bigger - this might be a loss.)

Overall I'm going to guess, though, that with 400,000 offsets your biggest source of overhead is that you have 400,000 Perl scalars. If performance is a problem it might save effort to have the data packed into a long string. You might even use C to build said string.


In reply to Re^7: Producing a list of offsets efficiently by tilly
in thread Producing a list of offsets efficiently by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.