I was just surprised that you could use XML::DOM at all on files of that size. And it looks like you can't actually, a 1gb XML file would take at least 8gb in memory using XML::DOM. So it might be interesting to know how you did it. What I meant was that if you had been able to do it, by throwing large amounts of memory at the problem, then XML::LibXML would have been an option.

With XML::Twig you can very easily extract the k/v pairs:

my $t= XML::Twig->new( twig_roots => { SigData => sub { push @keys, $_->field( 'Key'); push @values, $_->field( 'Value'); $_->purge; } }, ) ->parsefile("my_big_fat_xml_file.xml");

Of course the @keys and @values arrays are going to be huge too, so you might still want to add a few GB of RAM to your machine, but at least the XML structure will never take up more than a few bytes.

Other possible options are XML::Rules (I expect jenda to show up and give you an example as soon as he wakes up, and maybe the new XML::Reader, which seems quite appropriate. XML::LibXML's pull mode might also be appropriate, but I have never used it so I can't comment on it.


In reply to Re^3: XML processing taking too much time by mirod
in thread XML processing taking too much time by koti688

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.