I tried to phrase my reply to have as little controversy interpretable as possible. I really wanted to hear your thoughts and wisdom.

I wasn't attempting to refute what you said, only show that O(1) doesn't tell the whole story. "amortized average O(1)" probably does, but only if you disregard the localised impact of reallocations which is what I was looking to avoid. The example I am working with is a 25 MB file producing 400,000 offsets. Building 400,000 element array with individual pushes is not insignificant, and I'm hoping to support larger. My current thinking is that building a smaller intermediate array amd then adding it to the main array, or building an array of arrays would avoid the larger reallocation/copies.

The latter may also provide some additional benefits. By building an AoA where the each element of the top level array is an array of (say) 1000 offsets from the base rather than absolute positions, when the string is modifed by insertion or deletions, adjusting the offsets within one sub array and the absolute positions the base addresses of the others, would be quicker than adjusting all the absolute positions. It would also require less ram to store offsets rather than absolute positions.

To extract the greatest benefit from this, knowing at what points the reallocations will occur and chosing the size of the nested arrays to avoid stepping over the next boundary is important. I thought that the reallocations might go in powers of two--hence my choice of 130900 + 200 stepping over the 2**17 boudary--but this does not appear to be the case.

Could you explain the "1/5 th" thing for me in a little more detail? Or point me at the appropraite place to read about it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

In reply to Re^6: Producing a list of offsets efficiently by BrowserUk
in thread Producing a list of offsets efficiently by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.