So I've been spending time working on yet another online bookmark repository, and mine can take snapshots of pages so that you can view a cached copy later if the original page is taken down or altered.

The kicker comes when I'm exporting and importing the user's data - I use XML::Writer to output the data, and I export the bookmarks and cache objects into an XML file. The cached objects are encoded using MIME::Base64. Exporting works like a charm...

The problem is when importing data. Perl runs out of memory! I have an XML::Parser object created with appropriate handlers that branch depending on the element type. The export format contains three main elements - "post", "object" and "relationship". (A relationship is a link between two objects). So on import, relationships and posts import fine but Perl has the memory issues when reading and decoding the Base64 encoded objects.

The XML::Parser fires events into my "Char" handler, where I append the characters in the current element to a scalar...

Like so:
$parser->setHandlers(Char => sub { my $expat = shift; $ocontent = $ocontent . shift; }...

The $ocontent scalar is defined in the scope of the function that instantiates and starts the parse. So I build up the $ocontent with the encoded data and then finally I call decode on it and import it into the user's database. That works well for somewhere around two hundred objects, at which point Perl freaks out trying to map and unmap memory (according to strace) and stalls at full CPU usage. When I comment out the "$ocontent = $ocontent . shift;" line, I don't run out of memory. And I tried setting the ocontent variable once, thinking that maybe the Base64 decode method or the MySQL DBI methods were causing the error - so in that test I used the same chunk of data for all objects, and I did not run out of memory.

So all signs point to my character buffer causing a memory leak. Does anyone know how to fix or workaround this? (Also you can look right at the CVS tree of my project if it helps).

Update: I figured it out! After appending the characters from XML::Parser to my string I now undef the expat character variable. Suddenly the whole script moves way faster and uses less memory. This is the new character data handling routine:

Char => sub { my $expat = shift; my $chars = shift; $cbuffer = $cbuffer . $chars; undef $chars; }

In reply to Out of memory with XML::Parser by LukeyBoy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.