Hm. All I can do at this point is confirm your findings.
Constructing an @AoH of 5000 copies of %ENV in as memory efficient way as I know how, reports ~9MB in memory and ~9MB on disk, but constructing it uses 40MB of ram. I see no sign of leaks from Storable.
c:\test>perl -MDevel::Size=total_size -MStorable=dclone,store -E"$#AoH=5e3; $AoH[$_] = dclone \%ENV for 0..$#AoH-1; <>; print total_size( \@AoH ); store \@AoH, 'junk.dat'; <>" 9555355 c:\test>dir junk.dat 10/04/2010 13:20 9,090,025 junk.dat
And loading it back into memory still reports ~9MB for the structure, but it requires 40MB to construct it:
c:\test>perl -MDevel::Size=total_size -MStorable=dclone,retrieve -E"$AoH = retrieve 'junk.dat';<>;print total_size( $AoH );<>" 9842331
I don't know why that is, but I have a cogent speculation. It is the same problem that used to afflict Devel::Size, (still does afflict the "official" version as far as I'm aware!), and still afflicts Data::Dumper and other similar modules.
That of using a hash internally to track which scalars have already been seen, so that when the data is restored, duplicated references to the same sub-structures are re-created as such, and not duplicated as identical but different copies. Whilst the purpose of these tracking hashes is required for the module(s) to work correctly, they frequently grow to many times the size of the structures that are being serialised/deserialised.
The code I implemented in my version of Devel::Size, to perform this tracking function uses a fraction of the memory and is much faster. It could be used by Storable (and the other modules mentioned above) to great affect. But it would be a brave (and possibly foolhardy) man to do this a a patch, because the changes would be pervasive, and the testing requirements paramount. It would require the (up-front) blessings (and assistance) of the module owners to have a chance at succeeding.
I could break out the tracking code from Devel::Size and make it available as some kind of library. If anyone was going to use it.
The only quick solution I can offer, is that instead of storing/retrieving the entire AoH as a single Storable entity, you store and retrieve it as a single file containing a series of smaller Storable objects.
As the overall size of each Storable creation would be a fraction of the total size, the size of the tracking hash would be also be much smaller. And by saving and loading in a series of calls to Stroable, the memory for the internal tracking hash would be re-used at each iteration, thereby further reducing the overall memory requirement hugely.
If this approach appeals to you, I could put together some code to demonstrate it?
In reply to Re^5: Issue with cloning and large structure processing
by BrowserUk
in thread Issue with cloning and large structure processing
by scathlock
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |