It would probably make more sense to parse the file as though it were a stream instead of trying to swallow the beast whole. I recommend setting up a SAX handler, feeding X::P::E a root tag, and then start feeding it lines from the gzip'd file.
When I say "feeding it", that should read "use XML::Parser::ExpatNB and its parse_more() method." Then in the SAX handler, you can build up your own data structure, based on "depth" within the document. Depth meaning when you actually want data (ie. if you want to do processing on particular children nodes inside particular "top level" nodes) instead of filling up your ram with a giant DOM.
What you are essentially describing is a Jabber IM session (an immense XML document) and this is a solved problem. Not to pimp my own code, but you could take a look at
POE::Filter::XML for how this whole feed-the-parser thing can be implemented (with the usual caveats: YMMV, HTH, etc).
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.