Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I have the below xml files and would like to merge/updated them.

<termEntry id="1"> <descrip type="entryID">1</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm1</term> <termNote type="Note1">Note1</termNote> <termNote type="Note2">Note1</termNote> </tig> </langSet> <langSet xml:lang="FR"> <tig> <term>FrenchTerm1</term> </tig> </langSet> </termEntry> <termEntry id="2"> <descrip type="entryID">2</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm2</term> <termNote type="Note1">Note1</termNote> <termNote type="Note2">Note1</termNote> </tig> </langSet> <langSet xml:lang="FR"> <tig> <term>FrenchTerm2</term> </tig> </langSet> <langSet xml:lang="ES"> <tig> <term>SpanishTerm2</term> </tig> </langSet> </termEntry>
<termEntry id="25"> <descrip type="entryID">1</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm1</term> </tig> </langSet> <langSet xml:lang="IT"> <tig> <term>ItalianTerm</term> </tig> </langSet> </termEntry> <termEntry id="26"> <descrip type="entryID">1</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm15</term> </tig> </langSet> <langSet xml:lang="IT"> <tig> <term>ItalianTerm15</term> </tig> </langSet> </termEntry>

The expected output should be a file which contains updated ids (with new terms) and new ids, in case terms does not exist.

In other words, if an EN term of the second file exists in the first file it should be added under the same term. Otherwise, it should be added as new term.

<termEntry id="1"> <descrip type="entryID">1</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm1</term> <termNote type="Note1">Note1</termNote> <termNote type="Note2">Note1</termNote> </tig> </langSet> <langSet xml:lang="IT"> <tig> <term>ItalianTerm</term> </tig> </langSet> </termEntry> <langSet xml:lang="FR"> <tig> <term>FrenchTerm1</term> </tig> </langSet> </termEntry> <termEntry id="2"> <descrip type="entryID">2</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm2</term> <termNote type="Note1">Note1</termNote> <termNote type="Note2">Note1</termNote> </tig> </langSet> <langSet xml:lang="FR"> <tig> <term>FrenchTerm2</term> </tig> </langSet> <langSet xml:lang="ES"> <tig> <term>SpanishTerm2</term> </tig> </langSet> </termEntry> <termEntry id="26"> <descrip type="entryID">1</descrip> <langSet xml:lang="EN"> <tig> <term>EnglishTerm15</term> </tig> </langSet> <langSet xml:lang="IT"> <tig> <term>ItalianTerm15</term> </tig> </langSet> </termEntry>

Do you have any idea on how can I start?

Thanks in advance for your time and consideration.

Replies are listed 'Best First'.
Re: Updating XML files
by GotToBTru (Prior) on May 24, 2016 at 12:08 UTC

    Starting Place. There are numerous XML modules on CPAN. Avoid XML::Simple. XML::Twig and XML::Rules both come recommended here. These will allow you to manipulate the documents within Perl. Using Super Search, with some digging, you will find specific examples of how these are used.

    It's easier to offer suggestions about code you have already written, even if incomplete, than it is to invent it out of thin air. Help us help you!

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Updating XML files
by choroba (Cardinal) on May 24, 2016 at 12:30 UTC
    Hash the entries in file1 by their English term. Then iterate over the entries in file2, insert or update each entry based on its existence in the hash.

    You haven't provided the full XML, so I had to wrap each file into <root>...</root> in order to be able to play with them. I named the first file 1.xml and the second one 2.xml . I then implemented the logic in XML::XSH2:

    my $file2 := open 2.xml ; open 1.xml ; my $t := hash langSet[@xml:lang='EN']/tig/term /root/termEntry ; for my $entry in $file2/root/termEntry { my $entry1 = xsh:lookup('t', $entry/langSet[@xml:lang='EN']/tig/te +rm) ; if ($entry1) { for my $lang in $entry/langSet/@xml:lang { if (0 = count($entry1/langSet[@xml:lang=$lang])) cp $lang/.. into $entry1 ; } } else { cp $entry into /root ; } } save :b ;

    BTW, I had to remove the </termEntry> before <langSet xml:lang="FR"> in the expected output, I hope that's what you meant.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Hi

      Thank you very much for your help. However, I am not able to run your code because of an error I get.

      Use of := for an empty attribute list is not allowed

      Can you please advise?

      Thanks

        You can't run it as Perl code. Install XML::XSH2, save the code as script.xsh , run it with xsh script.xsh . Or, supply the whole code to the xsh function inside of a Perl script.
        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,