Re: Re: Re: Memory allocation strategies

The good thing about the doubling strategy is it keeps the number of reallocations to a minimum. The bad things are

You are copying more and more each time.
It requires more and more shuffling of heaps at each level--perl internal, C-runtime, OS--to find a big enough chunk of contiguous memory to satisfy the reallocation.
When you are approaching your maximum size, adding 1 more element to your array that takes you over the current 'power of 2' is going to double your memory usage of which just under half is wasted in the extreme case.

I think I would use a different strategy. I would modify Tie::Array to (hold your breath:) use an array! Not for the individual elements, but for an array of dynamically grown but fixed maximum size 'buckets'. With array indices, working out which bucket to use to retrieve any individual element of the actual array is simple math.

Preallocate the first bucket to say 16k minimum and use the doubling strategy to reallocate it up to a maximum of say 64k or 128k. Then, allocate a second bucket and follow the same strategy. You can also preallocate the size of the array to some reasonable starting point.

This strategy minimises both the total memory allocation when the array is fully completed by keeping the wasted space to a maximum of your bucket size. It also keeps the amount of memory that needs to be copied each time reallocation occurs to that half of that same value. It also minimizes the 'perl array' overhead.

The values of 16k and 64k are just examples. Choosing the right initial and maximum sizes for your buckets will depend very much on your rate of growth etc. One interesting number is 180224. This is 44 * 4096. Using this as your maximum bucket size means that you would prevent any wastage within each bucket and simplify the math as 44 fits in an exact number of times. You could use the doubling strategy starting with an initial allocation of 11* 4096 = 45056; and double until you reach 44 * 4096 = 180224. This keeps the maximum wastage at completion reasonable and allows an integral number of your 44 byte blocks to fit at each stage.

Using 44 * 4096 as the maximum bucket size to store 100 MB of data would use a bucket array of 582 elements (buckets). Preallocating a 582 element array of null scalars uses around 132k gving an overhead of just 0.13% which doesn't seem to bad.

The reason for using 4096 is because Win32 tends to round up memory commits to the nearest page size which is 4096 on intel. However, it is quite likely that perl adds some overhead bytes to each allocation beyond those of the user data-structures in order to manage its internal heaps. It is also quite likely that if your perl is calling the compiler runtime to alloc its heaps, that they are also adding a few bytes to each allocation for their internal use.

Overall, working out the actual overheads on any given allocation given the documented overhead of perls scalars and array chains, the undocumented overhead of perls and the runtimes heap management is pretty impossible, so using the 4k OS pagesize as the basic unit seems as good a way to go as any. On other OSs and platforms this would probably be different.

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Comment on Re: Re: Re: Memory allocation strategies

Replies are listed 'Best First'.
Re: Re: Re: Re: Memory allocation strategies by demerphq (Chancellor) on Sep 02, 2003 at 10:37 UTC
Hmm, thanks for the reply BrowserUk. I had thought of something like this, but unfortunately I need the full array in contiguous memory at all times. The problem is that the base algorithm needs to be optimized for lookups, and as such the lookup is done in C and is a very minimal function (all precondition/postcondition checks are handled in the perl layer on STORE so the FETCH can be as minimal as possible) I will likely have to call the lookup several million times more than the store. So if I use a discontinuous representation I have to put a bunch of logic in the lookup to handle those cases, or repack the whole thing if a STORE has occured since the last FETCH, and it is precisely the last case that is the reason for me using the implementation I am. Originally I just used a normal array to represent the object from the perl POV and a packed representation (packed on demand when a FETCH occured after a STORE) for the C. However this meant that I ended up storing the data twice, and the array storage was very wasteful of ram. Anyway, your idea made me think of an alternate strategy for allocation. The likelyhood that the base algorithm needs to extend the array is inversely proportional to the number of items in the data structure for typical data sets for what the algorithm is intended for. So I was thinking of setting up an array of allocation sizes that goes from large, say 64k down to relatively small (4096 sounds good based on the reasons you point out, thanks). This way the initial allocation will cover small data sets nicely without wasting too much ram, and later allocation will be more conservative and still end up "fitting" quite nicely. I havent done this yet, but I plan to have a look at its behaviour and see how it goes. Cheers and thanks again for the reply. Some good stuff there. --- demerphq _{<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...}	[reply] [d/l]
Re: Re: Re: Re: Re: Memory allocation strategies by BrowserUk (Patriarch) on Sep 02, 2003 at 11:37 UTC
If your doing this from C, then you might consider taking both Perl and the C-runtime out of the equation completely and going directly to the OS for your memory. The OS has an API with which you can reserve a large amount of (virtual) memory without actually allocating it. You then make a second call to actually commit chunks of the reserved allocation as you need it. In this way, you can reserve a contiguous address space large enough to accomodate your biggest likely requirements and then grow the actual allocation into that reserved contiguous address space as you need to. This removes the need for any reallocation and the copying and (temporary) duplication that that entails and bypasses several intermediate levels of freespace chain chasing, coalesing and shuffling to boot. As your search code is written in C, there is no great problem with utilising the address space provided by the OS for this purpose, you don't need for the overall space to be managed by Perl. You can simple copy your individual array elements to a Perl scalar for returning to the perl code once you have located it. Needless to say I haven't done this from perl, but the C part is relatively trivial. I located this brief description and sample code on MSDN that demonstrates using this technique if the idea is of any interest. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply]