in reply to Concatenating XML files

The method you are looking for is xml_string, which is also aliased as innerXML.

#!/usr/bin/perl use strict; use warnings; use XML::Twig; print "<foo>"; foreach my $file ( ' to_concat_1.xml', 'to_concat_2.xml') { print XML::Twig->new( keep_spaces =>1, comments => 'process')->par +sefile( $file)->root->xml_string; } print "</foo>\n";

Note that the comments => 'process' is used only because you have comments just before the end of the foo element in your example, it is probably not needed in your real code.

Also a more generic way would be to keep the first document, and then to add the other ones at the end, removing their root:

#!/usr/bin/perl use strict; use warnings; use XML::Twig; my $result_twig; foreach my $file ( ' to_concat_1.xml', 'to_concat_2.xml') { my $current_twig= XML::Twig->new( comments => 'process')->parsefil +e( $file); if( !$result_twig) { $result_twig= $current_twig; } else { $current_twig->root->move( last_child => $result_twig->root) ->erase; } } $result_twig->print;

Replies are listed 'Best First'.
Re^2: Conatenating XML files
by graff (Chancellor) on Jul 30, 2007 at 17:39 UTC
    Also a more generic way would be to keep the first document, and then to add the other ones at the end, removing their root: ...

    I'm not sure how much more "generic" your second solution would be... It might not apply very well, I think, in cases where the quantity of xml data to be concatenated, multiplied by the additional memory consumed for DOM data structure storage in perl, could exceed available RAM.

      It's more generic in the sense that it doesn't assume that the root tag of the first document is foo. As you did not mention any constraint on the size of the documents, then I did not assume any.

      If there are contraints, for each individual file or for the resulting file, then you should mention it and the solution would be different. Depending on the constraints in terms of speed and potentialsize of the documents, the best solution could be regexp based (that could be made quite robust, provided your XML files do not include DTDs), XML::LibXML based (if individual files are not too big to be loaded in memory), XML::Parser based (rather easy if no DTD is used, a bit more complicated otherwise) or XML::Twig based (slower, but you could deal with arbitrary sized documents, and that would be quite easy to code, although a bit more complex than the examples I gave previously).

      But all those potential constraints would be part of the requirements for your code, so you would have to express them if you want a response that really fits your problem.

      And no, I am not trying to confuse you to cover up the fact that my answer wasn't that smart ;--)

        To clear things up a bit: all xml files to be concatenated have to have the same, known root tag (and the script will probably warn if it's wrong, and skip that file).

        Additionally the result file will surely fit into memory (I'd be very surprised if a result file even approaches 50MB), so no need to worry.

        And yes, there is a DTD available, but sadly it doesn't match the required structure of the XML files. The developers "solved" this problem by not referencing the DTD.