Re^2: problems with garbage collection (

But when clean up occurs, every scalar has to have its reference count decremented to 0 before it can be released

Actually, during "global destruction", most reference counts are not decremented. Lots of allocated memory isn't free()d either. Perl makes sure to call any DESTROY()s that need to be called. But most ordinary data structures it tries to just leave them allocated and let the act of the process exit()ing efficiently de-allocate everything in one fell swoop rather than making thousands of calls to free() and decrementing every reference count of every item until they all reach zero.

But your explanation might be quite correct for the case being discussed. It certainly seems to fit.

Perhaps the performance could be improved by ensuring that the large hash is not destroyed until the global destruction phase has begun. Though, given the vague code provided so far, I don't see why the hash wouldn't live that long. The OP should probably get to work investigating or just providing more details (like producing a minimal test case that reproduces the problem).

You can also avoid Perl's clean-up phase entirely via POSIX:

use POSIX '_exit';
_exit();
[download]

instead of using exit.

BTW, if that simply solves the problem, I do hope the OP will still provide more information so we can reproduce the problem and properly understand it.

- tye

Comment on Re^2: problems with garbage collection (_exit) Download Code

Replies are listed 'Best First'.
Re^3: problems with garbage collection (_exit) by Anonymous Monk on Jul 14, 2010 at 18:16 UTC
Hello, thanks you all for the help. the POSIX::_exit(0) solution definitly helps and makes this part of the software MUCH faster and doesn't start swapping when it did before. I guess I would never have found that solution. I still have to test, how far we can go with the current hardware before we max out and if I can find some things to optimize the ram usage Sorry, don't know how to make a small testcase. maybe it is helpfull when I describe the software abit. I currently have to make some old software working parallel to reduce the absolute time needed to work through a certain amount of datasets and make better use of the current hardware. we have lots of datasets in several databases. now we get a new version of the datasets in csv files and want to compare them with the old (via fingerprint) to find changes. if changes happen, we write output exportfiles for other software and change the values in the databases for this the parent process loads the fingerprint values (and other data) into a huge hash. now we fork as much processes as we have processors. because of the copyOnWrite in perl we already load a lokal copy for each process when just reading the original hash the amount of needed RAM explodes... :( the current hardware is a 16 core system with 24GB of RAM and 8GB of swap (don't kill me, i am not the sysad behind the swap size decision;)) the largest examples we have go to about 3.5mio datasets, before the _exit() solution we already got problems with about 1.5 mio datasets. I now have to find out how high we can go and if there are still ways to optimize the software abit, so we can use the software even for the largest amount of datasets. but thats some work for tomorrow :)	[reply]

Replies are listed 'Best First'.

Re^3: problems with garbage collection (_exit)
by Anonymous Monk on Jul 14, 2010 at 18:16 UTC

Sorry, don't know how to make a small testcase. maybe it is helpfull when I describe the software abit. I currently have to make some old software working parallel to reduce the absolute time needed to work through a certain amount of datasets and make better use of the current hardware.

we have lots of datasets in several databases. now we get a new version of the datasets in csv files and want to compare them with the old (via fingerprint) to find changes. if changes happen, we write output exportfiles for other software and change the values in the databases

for this the parent process loads the fingerprint values (and other data) into a huge hash. now we fork as much processes as we have processors. because of the copyOnWrite in perl we already load a lokal copy for each process when just reading the original hash the amount of needed RAM explodes... :(
the current hardware is a 16 core system with 24GB of RAM and 8GB of swap (don't kill me, i am not the sysad behind the swap size decision;))
the largest examples we have go to about 3.5mio datasets, before the _exit() solution we already got problems with about 1.5 mio datasets.

I now have to find out how high we can go and if there are still ways to optimize the software abit, so we can use the software even for the largest amount of datasets. but thats some work for tomorrow :)

[reply]