Re^2: Completeness of Devel::Size answers
by gone2015 (Deacon) on Sep 23, 2008 at 00:35 UTC
|
OK... If I run the code with Devel::Size::size it says the 1 million entry hash requires 94.8 M. If I run with the calls to Devel::Size::size commented out, the VM still grows from ~100M to ~270M -- nearly twice what's expected :-(
For the full 5 million, VM grows to 1.0G, so the hash appears to need 900M -- which is a bit more than 188 bytes per entry... where Devel::Size::size suggested 100 ??
Mind you, I am assuming that all the increase in VM size is down to the hash -- which I think it is, because if I comment out the assignment to the hash, the VM doesn't change size.
Thanks.
| [reply] [d/l] [select] |
|
|
Mind you, I am assuming that all the increase in VM size is down to the hash
If you could inspect the memory, you would probably find that a substantial portion of the total memory allocated to the process is actually free for re-use once the hash has been built.
This is because as the hash grows in size, as it fills towards the maximum capacity of it's current size, it reaches a point where the Perl decides it is time to double the number of buckets. In effect, a new hash is allocated that has twice the capacity of the old one and the contents of the old one are copied into the new before any new keys are added. Once the copy is complete, the memory for the old copy is freed back to the process pool (but not the system) for reuse. So whilst it may take (say) 600MB to hold the completely built hash, there is a brief point in time where two copies of the hash so far are required.
If you have the ability to monitor the process memory usage as the hash is built, you'll see the graph of the memory allocated versus time looks something like this:
_________________|
|
|
|
|
|
|
|
________|
|
|
|
____|
|
__|
|
Where both the vertical jumps (memory allocated) and horizontal jumps (time between reallocs) doubles. It is an effective strategy for performance, but can mean that a substantial portion of the memory allocated to the last but one incarnation of the hash doesn't get re-used (by the hash). However, that memory is available for use by the rest of your program.
One thing that may allow you to reduce the overall memory consumption of your program, is to ensure that the big hash gets built first. For example, you suggest that you've already loaded/generated 100MB of data in memory before you build this hash. If you delayed the creation/loading of that data until after the hash is constructed, you may find that there is enough memory, (allocated during the construction of the hash, but freed back to the pool before it is complete), to accomodate most or all of that 100MB without the need to request more from the OS.
If you're banging your head against the limits of your machine, that may just be enough to stop you going into swapping. Worth a try at least if your program logic can accommodate that change.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
: Entries: Used/Buckts : Expect : Heap : Aux :
:--------:-------/-------:--------:--------:--------:
: 0: 0/ 0 : 0.0M : 0.9M : :
: 10: 9/ 10 : 0.0M : " : :
: 20: E/ 20 : 0.0M : " : :
: 40: 1B/ 40 : 0.0M : " : :
: 70: 38/ 80 : 0.0M : " : :
: 130: 67/ 100 : 0.0M : " : :
: 260: CD/ 200 : 0.0M : " : :
: 550: 1A8/ 400 : 0.1M : 1.0M : :
: 1100: 368/ 800 : 0.1M : 1.2M : :
: 2100: 66D/ 1000 : 0.2M : 1.3M : :
: 4100: C9B/ 2000 : 0.4M : 1.7M : :
: 8200: 1913/ 4000 : 0.8M : 2.2M : 0.3M :
: 16400: 327F/ 8000 : 1.7M : 3.4M : 0.8M :
: 32800: 64D5/ 10000 : 3.3M : 5.6M : 1.8M :
: 65600: CA94/ 20000 : 6.7M : 10.1M : 3.8M :
: 131100: 1940F/ 40000 : 13.4M : 19.1M : 7.8M :
: 262200: 325A8/ 80000 : 26.8M : 37.2M : 15.8M :
: 524300: 64A98/100000 : 53.5M : 73.3M : 31.8M :
: 1048600: C9691/200000 : 107.0M : 145.6M : 63.8M :
: 2097200: 192DE0/400000 : -0.0M : 290.0M : 127.8M :
: 4194400: 326095/800000 : -0.0M : 578.9M : 255.8M :
: 5000000: 3977F6/800000 : -0.0M : 683.7M : 255.8M :
Which shows the effect you described, the hash being rebuilt with more buckets as it grows, and the heap growing with it. (The Used/Buckts is given in hex.)
This table also shows the expected size of the hash, according to Devel::Size::total_size() on a separate run -- where it says -0.0M I don't have a value, because life is too short.
The figures for actual memory use are given by the System Monitor. The Heap is present when Perl starts. The Aux appears as the hash grows -- FWIW, I observe that it's at the far end of VM space, below the Stack.
NB: the actual memory use is taken when my test is run with the Devel::Size::total_size() calls commented OUT -- so the figures are not affected by its overheads.
So, assuming that the hash is created on the heap, on the face of it, either Devel::Size::total_size() is under-reporting, or the construction of the hash is leaving free space in the heap. Anyway, the discrepancy is ~38 bytes per entry or ~19 bytes per bucket. (I assume that Devel::Size::total_size() does take into account the buckets overhead ?)
I tried reducing the key size from 33 to 17. I got the same discrepancy.
The other puzzle is what the Aux is. It appears to be something to do with the rebuilding of the hash. The size is independent of the key size. It's 32 bytes per new number of buckets, or 64 bytes per old number of buckets ! In any case, it's a significant chunk of memory (virtual or otherwise) :-( | [reply] [d/l] [select] |
|
|
Re^2: Completeness of Devel::Size answers
by jand (Friar) on Sep 23, 2008 at 00:09 UTC
|
You can detect circles without all the bookkeeping overhead, but you wouldn't be able to calculate the total size of a structure containing circular references (you may end up counting some elements multiple times, and you can't be sure you traversed the whole structure before you detected a circle). | [reply] |
|
|
You can detect circles without all the bookkeeping overhead,
Interesting. How?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
To detect cycles, you need only keep a hash as large as your search is deep. That even allows you to detect cycles "immediately", but it doesn't tell you when you have the same structure referenced from two different places in your directed, acyclic graph: my @a= ( \%huge, \%huge );
| [reply] [d/l] |
|
|
FWIW, I was stunned when I came across the following:
To detect cycles you can use two pointers -- say, p and q. You step p through the data in the usual way. You step q through the data twice for each step of p, until the end is met. If p equals q before the end, you have a loop.
The tricky bit in a recursive descent tree walk is to walk two pointers at once ! You need to build an iterator for q which contains at least a stack for recursion, and, for Perl, some way of iterating through a hash which isn't going to trip up p. (Of course, you have to guarantee that p and q will visit the nodes in the same order, especially where there is a loop !)
Apart from that minor difficulty, this trick only tells you there is a cycle. It doesn't tell you where it is.
"It's a good sort of brake. But it hasn't worked yet."
| [reply] |
|
|
|
|
|
|
|
|
| [reply] |
Re^2: Completeness of Devel::Size answers
by gone2015 (Deacon) on Sep 23, 2008 at 10:13 UTC
|
The problem is that Devel::Size itself uses a substantial amount of memory. Internally, it builds a hash of all the SVs (scalars) it encounters whilst traversing the stucture being sized. It does has to do this in order to detect circular references that would otherwise cause it never to return.
This strikes me as pessimistic. With two passes, the first could collect all references, and avoid following repears repeats -- avoiding loops and avoiding counting things twice. The second pass could generate the size count.
Actually, AFAIKS, the only reason you need two passes is because with ref:SCALAR you cannot otherwise tell that whether the SCALAR is internal or external, and, if internal, whether it's already been counted. My guess is that ref:SCALAR is relatively rare...
... perhaps the trick is to proceed optimistically, constructing the size during pass 1. At the end, if there are any ref:SCALAR entries, need to run a half-pass to rescan the structure, looking for any referred to internal SCALARs, up to where they are first referred to.
... I can see that for some structures one and a half passes would be more expensive than the current approach. However, at least the penalty is paid where things are difficult. The current stuff falls over if the data is large, however simple.
Rats. Something in the back of my mind is telling me that string value SVs may implicitly refer to the string... But then there would be a ref-count > 1... which could be picked up in Pass 1...
| [reply] |