in reply to Re^2: Completeness of Devel::Size answers
in thread Completeness of Devel::Size answers

Mind you, I am assuming that all the increase in VM size is down to the hash

If you could inspect the memory, you would probably find that a substantial portion of the total memory allocated to the process is actually free for re-use once the hash has been built.

This is because as the hash grows in size, as it fills towards the maximum capacity of it's current size, it reaches a point where the Perl decides it is time to double the number of buckets. In effect, a new hash is allocated that has twice the capacity of the old one and the contents of the old one are copied into the new before any new keys are added. Once the copy is complete, the memory for the old copy is freed back to the process pool (but not the system) for reuse. So whilst it may take (say) 600MB to hold the completely built hash, there is a brief point in time where two copies of the hash so far are required.

If you have the ability to monitor the process memory usage as the hash is built, you'll see the graph of the memory allocated versus time looks something like this:

_________________| | | | | | | | ________| | | | ____| | __| |

Where both the vertical jumps (memory allocated) and horizontal jumps (time between reallocs) doubles. It is an effective strategy for performance, but can mean that a substantial portion of the memory allocated to the last but one incarnation of the hash doesn't get re-used (by the hash). However, that memory is available for use by the rest of your program.

One thing that may allow you to reduce the overall memory consumption of your program, is to ensure that the big hash gets built first. For example, you suggest that you've already loaded/generated 100MB of data in memory before you build this hash. If you delayed the creation/loading of that data until after the hash is constructed, you may find that there is enough memory, (allocated during the construction of the hash, but freed back to the pool before it is complete), to accomodate most or all of that 100MB without the need to request more from the OS.

If you're banging your head against the limits of your machine, that may just be enough to stop you going into swapping. Worth a try at least if your program logic can accommodate that change.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: Completeness of Devel::Size answers
by gone2015 (Deacon) on Sep 23, 2008 at 18:12 UTC
    If you have the ability to monitor the process memory usage as the hash is built, you'll see the graph of the memory allocated versus time looks something like...

    I've dug a bit deeper. I got the hash building to stop each time the number of buckets increased. This is what I saw:

        : Entries:   Used/Buckts : Expect :  Heap  :  Aux   :
        :--------:-------/-------:--------:--------:--------:
        :       0:      0/     0 :   0.0M :   0.9M :        :
        :      10:      9/    10 :   0.0M :    "   :        :
        :      20:      E/    20 :   0.0M :    "   :        :
        :      40:     1B/    40 :   0.0M :    "   :        :
        :      70:     38/    80 :   0.0M :    "   :        :
        :     130:     67/   100 :   0.0M :    "   :        :
        :     260:     CD/   200 :   0.0M :    "   :        :
        :     550:    1A8/   400 :   0.1M :   1.0M :        :
        :    1100:    368/   800 :   0.1M :   1.2M :        :
        :    2100:    66D/  1000 :   0.2M :   1.3M :        :
        :    4100:    C9B/  2000 :   0.4M :   1.7M :        :
        :    8200:   1913/  4000 :   0.8M :   2.2M :   0.3M :
        :   16400:   327F/  8000 :   1.7M :   3.4M :   0.8M :
        :   32800:   64D5/ 10000 :   3.3M :   5.6M :   1.8M :
        :   65600:   CA94/ 20000 :   6.7M :  10.1M :   3.8M :
        :  131100:  1940F/ 40000 :  13.4M :  19.1M :   7.8M :
        :  262200:  325A8/ 80000 :  26.8M :  37.2M :  15.8M :
        :  524300:  64A98/100000 :  53.5M :  73.3M :  31.8M :
        : 1048600:  C9691/200000 : 107.0M : 145.6M :  63.8M :
        : 2097200: 192DE0/400000 :  -0.0M : 290.0M : 127.8M :
        : 4194400: 326095/800000 :  -0.0M : 578.9M : 255.8M :
        : 5000000: 3977F6/800000 :  -0.0M : 683.7M : 255.8M :
    
    Which shows the effect you described, the hash being rebuilt with more buckets as it grows, and the heap growing with it. (The Used/Buckts is given in hex.)

    This table also shows the expected size of the hash, according to Devel::Size::total_size() on a separate run -- where it says -0.0M I don't have a value, because life is too short.

    The figures for actual memory use are given by the System Monitor. The Heap is present when Perl starts. The Aux appears as the hash grows -- FWIW, I observe that it's at the far end of VM space, below the Stack.

    NB: the actual memory use is taken when my test is run with the Devel::Size::total_size() calls commented OUT -- so the figures are not affected by its overheads.

    So, assuming that the hash is created on the heap, on the face of it, either Devel::Size::total_size() is under-reporting, or the construction of the hash is leaving free space in the heap. Anyway, the discrepancy is ~38 bytes per entry or ~19 bytes per bucket. (I assume that Devel::Size::total_size() does take into account the buckets overhead ?)

    I tried reducing the key size from 33 to 17. I got the same discrepancy.

    The other puzzle is what the Aux is. It appears to be something to do with the rebuilding of the hash. The size is independent of the key size. It's 32 bytes per new number of buckets, or 64 bytes per old number of buckets ! In any case, it's a significant chunk of memory (virtual or otherwise) :-(

      (I assume that Devel::Size::total_size() does take into account the buckets overhead ?)

      I can confirm it does. Or rather, it certainly used to back circa v0.58, when I understood it's internals enough that I succeeded in supplying a trivial patch. It seems unlikely that it would have suddenly stopped doing so now, but I pulled v0.71 earlier today and I no longer understand what hell is going on inside there.

      For the rest of your post, I am unqualified to comment as I know little about *nix memory management. I have no idea what Aux is, nor even imagine what it could be. Someone with *nix experience will be needed to interprete your numbers and explain what is going on.

      I do wonder if this is related to zentara's recent discovery that *nix can release some memory back to the OS under some circumstances?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.