bumpkin has asked for the wisdom of the Perl Monks concerning the following question:

Hi PERL dobins my problem is I have a number of xml files i want to join to gether into 1 combined xml file the structure each xml file is as follows i believe I need to join each file with the last 'documents' TAG removed but to include it in the last joined file

<Document StartPage="1" EndPage="1" PageCount="1" Id="c9a99999-99d9-40 +f3-9ec0-d99b7be99999" Account="AZRPOS"> <Recipient FirstName="test" Surname="test" Postcode="AA6 1AA" Id=" +c9a99999-99d9-40f3-9ec0-d99b7be99999"> <AddressLines> <AddressLine>Miss test test</AddressLine> <AddressLine>1 aaaaaa bbbbbbbb</AddressLine> <AddressLine>cccccccccc</AddressLine> <AddressLine>Fife</AddressLine> <AddressLine>AA6 1AA</AddressLine> </AddressLines> </Recipient> <Options> <Option Type="Colour Mode" Value="Black" /> <Option Type="Delivery Method" Value="1st Class" /> <Option Type="Envelope Type" Value="VW" /> <Option Type="Paper Type" Value="VW - ACCPERTIG140/1" /> <Option Type="Print Both Sides" Value="No" /> <Option Type="Recipients" Value="Multiple Letters" /> </Options> </Document> </Documents>

Replies are listed 'Best First'.
Re: xml join
by jellisii2 (Hermit) on Oct 02, 2013 at 17:30 UTC
    The XML posted isn't valid. It should have an opening <Documents> tag.

    Assuming all of your documents have this, you can parse each one with with the parser of your choice, trim the tree to the <Document> level, insert each branch into a <Documents> root.

Re: xml join
by hippo (Archbishop) on Oct 02, 2013 at 15:50 UTC

    Hello and welcome to the Monastery.

    Do tell us what you have tried. Did you use the -p flag, for example? Do you have a script which doesn't compile or doesn't run? What were the errors?

Re: xml join
by graff (Chancellor) on Oct 02, 2013 at 22:35 UTC
    This seems like it's not a perl question (esp. since you didn't post any code).

    But, by way of a non-perl answer, I have found that it can be very useful to concatenate a bunch of xml files that use a common schema/dtd, e.g. to get a global summary of contents.

    All I need is some arbitrary tag to serve as the outer-most container for the concatenated set. Sometimes the files have this at the beginning:

    <?xml version="1.0"?>
    (and sometimes that includes an 'encoding' attribute as well); those need to be filtered out. So the process boils down to a simple, 3-step sequence of shell commands (assuming there's one directory containing all the xml files of interest, and a separate path to use for output):
    echo '<arbitrary_tag>' > outpath/combined.xml cat inpath/*.xml | fgrep -v '<?xml' >> outpath/combined.xml echo '</arbitrary_tag>' >> output/combined.xml
    That also assumes that the order of file names you get from a default sort will put the files in the desired sequence (if that matters at all). If you want them in a sequence that differs from a default sort on the file names, you'll need to create a separate text file that lists the xml file names in the desired order, then pipe that list to "xargs cat" (instead of doing "cat inpath/*.xml").

    That also assumes you're on a system where the unix/linux/osx/cygwin "cat", "echo", "fgrep" and "xargs" commands are available.

Re: xml join
by hdb (Monsignor) on Oct 02, 2013 at 20:14 UTC

    Can you rely on the final </Documents> being on a line of its own? If so, read all files line by line and omit the last line (apart from the last file).