I tried to phrase my reply to have as little controversy interpretable as possible. I really wanted to hear your thoughts and wisdom.
I wasn't attempting to refute what you said, only show that O(1) doesn't tell the whole story. "amortized average O(1)" probably does, but only if you disregard the localised impact of reallocations which is what I was looking to avoid. The example I am working with is a 25 MB file producing 400,000 offsets. Building 400,000 element array with individual pushes is not insignificant, and I'm hoping to support larger. My current thinking is that building a smaller intermediate array amd then adding it to the main array, or building an array of arrays would avoid the larger reallocation/copies.
The latter may also provide some additional benefits. By building an AoA where the each element of the top level array is an array of (say) 1000 offsets from the base rather than absolute positions, when the string is modifed by insertion or deletions, adjusting the offsets within one sub array and the absolute positions the base addresses of the others, would be quicker than adjusting all the absolute positions. It would also require less ram to store offsets rather than absolute positions.
To extract the greatest benefit from this, knowing at what points the reallocations will occur and chosing the size of the nested arrays to avoid stepping over the next boundary is important. I thought that the reallocations might go in powers of two--hence my choice of 130900 + 200 stepping over the 2**17 boudary--but this does not appear to be the case.
Could you explain the "1/5 th" thing for me in a little more detail? Or point me at the appropraite place to read about it?
In reply to Re^6: Producing a list of offsets efficiently
by BrowserUk
in thread Producing a list of offsets efficiently
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |