Hm. All I can do at this point is confirm your findings.

Constructing an @AoH of 5000 copies of %ENV in as memory efficient way as I know how, reports ~9MB in memory and ~9MB on disk, but constructing it uses 40MB of ram. I see no sign of leaks from Storable.

c:\test>perl -MDevel::Size=total_size -MStorable=dclone,store -E"$#AoH=5e3; $AoH[$_] = dclone \%ENV for 0..$#AoH-1; <>; print total_size( \@AoH ); store \@AoH, 'junk.dat'; <>" 9555355 c:\test>dir junk.dat 10/04/2010 13:20 9,090,025 junk.dat

And loading it back into memory still reports ~9MB for the structure, but it requires 40MB to construct it:

c:\test>perl -MDevel::Size=total_size -MStorable=dclone,retrieve -E"$AoH = retrieve 'junk.dat';<>;print total_size( $AoH );<>" 9842331

I don't know why that is, but I have a cogent speculation. It is the same problem that used to afflict Devel::Size, (still does afflict the "official" version as far as I'm aware!), and still afflicts Data::Dumper and other similar modules.

That of using a hash internally to track which scalars have already been seen, so that when the data is restored, duplicated references to the same sub-structures are re-created as such, and not duplicated as identical but different copies. Whilst the purpose of these tracking hashes is required for the module(s) to work correctly, they frequently grow to many times the size of the structures that are being serialised/deserialised.

The code I implemented in my version of Devel::Size, to perform this tracking function uses a fraction of the memory and is much faster. It could be used by Storable (and the other modules mentioned above) to great affect. But it would be a brave (and possibly foolhardy) man to do this a a patch, because the changes would be pervasive, and the testing requirements paramount. It would require the (up-front) blessings (and assistance) of the module owners to have a chance at succeeding.

I could break out the tracking code from Devel::Size and make it available as some kind of library. If anyone was going to use it.

The only quick solution I can offer, is that instead of storing/retrieving the entire AoH as a single Storable entity, you store and retrieve it as a single file containing a series of smaller Storable objects.

As the overall size of each Storable creation would be a fraction of the total size, the size of the tracking hash would be also be much smaller. And by saving and loading in a series of calls to Stroable, the memory for the internal tracking hash would be re-used at each iteration, thereby further reducing the overall memory requirement hugely.

If this approach appeals to you, I could put together some code to demonstrate it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

In reply to Re^5: Issue with cloning and large structure processing by BrowserUk
in thread Issue with cloning and large structure processing by scathlock

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.