pwnguin has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm looking for a way to merge two XML files (appending/replacing existing values). I have checked out XML::Merge, but unfortunately it doesn't seem to support merging files with conflicts. I'm thinking I may need to write some code to do this as it is a bit of an odd thing to do, and was hoping someone could advise on which libraries I should use? As an example, this is what I want to do. I have an original file, and a "patch" file, and I want to merge these two files so that any additions in the patch file are put in the new file, and any existing nodes in the patch file are updated. Here is a (crude) example of what I mean:

Original File:

<NODEA> <NODEB UID="111" > <NODEC name="1" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> <NODEB UID="222" > <NODEC name="2" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA>
Patch File:
<NODEA> <NODEB UID="111" > <NODEC name="1" > <NODED> patched text 1 </NODED> <NODED> patched text 2 </NODED> </NODEC> </NODEB> <NODEB UID="333" > <NODEC name="3" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA>
Merged File:
<NODEA> <NODEB UID="111" > <NODEC name="1" > <NODED> patched text 1 </NODED> <NODED> patched text 2 </NODED> </NODEC> </NODEB> <NODEB UID="222" > <NODEC name="2" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> <NODEB UID="333" > <NODEC name="3" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA>
I would really appreciate any pointers on what libraries I could use for this, or if there is anything else like XML::Merge that I may have missed. I am thinking of using XML::Twig and to manually build up an array of nodes and compare them (although that sound really messy), but wanted to hear any ideas before I got started.

Thanks!

Replies are listed 'Best First'.
Re: Merging two XML files (with conflicts)
by BrowserUk (Patriarch) on Feb 06, 2012 at 21:58 UTC

    I hate XML, but I like Simple :)

    #! perl -slw use strict; use Inline::Files; use XML::Simple; sub traverse ($$); sub traverse ($$) { my( $xml, $pat ) = @_; if( ref $pat eq 'HASH' ) { for my $key ( keys %$pat ) { if( exists $xml->{ $key } and ref $pat->{ $key } ) { traverse( $xml->{ $key }, $pat->{ $key } ); next; } $xml->{ $key } = $pat->{ $key }; } } elsif( ref $pat eq 'ARRAY' ) { for my $idx ( 0 .. $#$pat ) { if( ref $pat->[ $idx ] eq 'HASH' ) { traverse( $xml->[ $idx ], $pat->[ $idx ] ); next; } $xml->[ $idx ] = $pat->[ $idx ]; } } } my $xml = XMLin( \*XML1, KeyAttr => [ 'UID' ], KeepRoot => 1 ); my $pat = XMLin( \*PATCH, KeyAttr => [ 'UID' ], KeepRoot => 1 ); traverse( $xml, $pat ); print XMLout( $xml, KeyAttr => [ 'UID' ], KeepRoot => 1 ); __DATA__ __XML1__ <NODEA> <NODEB UID="111" > <NODEC name="1" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> <NODEB UID="222" > <NODEC name="2" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA> __PATCH__ <NODEA> <NODEB UID="111" > <NODEC name="1" > <NODED> patched text 1 </NODED> <NODED> patched text 2 </NODED> </NODEC> </NODEB> <NODEB UID="333" > <NODEC name="3" > <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA>

    Outputs:

    c:\test>952158 <NODEA> <NODEB UID="111"> <NODEC name="1"> <NODED> patched text 1 </NODED> <NODED> patched text 2 </NODED> </NODEC> </NODEB> <NODEB UID="222"> <NODEC name="2"> <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> <NODEB UID="333"> <NODEC name="3"> <NODED> text1 </NODED> <NODED> text2 </NODED> </NODEC> </NODEB> </NODEA>

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Wow that is much better than the way I imagined! Thank you kindly!
Re: Merging two XML files (with conflicts)
by choroba (Cardinal) on Feb 07, 2012 at 00:14 UTC
    How should I indicate in the patch that I want to change the UID of NODEB from 222 to 333?
      I think you have hit a major problem here.

      Actually you can avoid changing/updating existing records, if you have a well established protocol for adding and deleting.

      Changing/updating then becomes a simple "delete this and add that" operation. But that is something which the "patch" file is ambiguous about. The way it is now, it can only be reliably used to add data, not to delete, nor change ("patch") existing data.

      What tells me that "patched text 1" should replace "text 1" and not simply be an additional <NODED> within <NODEC name="1">?

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James