pinkesh_like has asked for the wisdom of the Perl Monks concerning the following question:

hi.. i have an 2 large xml files(around 380MB). can anyone suggest some options by which i can compare the this 2 XML file with minimum runtime??. i need to compare the xml file by comparing some of the keys value in both the hashes likes 'message' , ID or variable_name .. if this values are equal then only the XML file is same keys.. thanks in advance
  • Comment on how to compare the 2 large xml file wiith minimun runtime??

Replies are listed 'Best First'.
Re: how to compare the 2 large xml file wiith minimun runtime??
by Discipulus (Canon) on Oct 11, 2013 at 07:00 UTC
    short question, shorter answer:
    XML::Twig with handlers and flush, populate an %hash, same thing for second file, then play with hashes.

    hth L*
    there are no rules, there are no thumbs..
Re: how to compare the 2 large xml file wiith minimun runtime??
by Anonymous Monk on Oct 11, 2013 at 07:36 UTC
Re: how to compare the 2 large xml file wiith minimun runtime??
by RMGir (Prior) on Oct 11, 2013 at 12:48 UTC
    There may be opportunities to "cheat", if you know something about the processes generating the XML. For instance, are the sections always in the same order? The message id's on the same lines in each section? etc...

    If you have to handle arbitrary XML, or you don't have control of the sources so you can't guarantee that any cheats will remain valid, I'd say find the fastest XML parser you can and go with that...


    Mike
Re: how to compare the 2 large xml file wiith minimun runtime??
by Anonymous Monk on Oct 11, 2013 at 13:33 UTC
    380MB, really, is not "large" for most computers these days, which could easily handle both data structures side-by-side in memory without serious swapping. Therefore you could simply suck the two files into memory and use something like Data::Compare. If you are looking for specific comparisons, XSLT might be useful to "drill down" to exactly what you are looking for.