Herkum has asked for the wisdom of the Perl Monks concerning the following question:

I have XML documents that are large XML documents and I want to break them up into multiple documents.

# Start <root> <a> <b>Test</b> </a> <aa> <b>Test</b> </aa> </root>
# New Document <root> <a> <b>Test</b> </a> </root> <root> <aa> <b>Test</b> </aa> </root>

I figured I could use twig_handlers, except for one problem. I don't know the exact name of the elements that will be below the root of the document. If I had the names it would be easy, but I am not sure exactly how to approach this.

I could just get all the children under the root, but I want to avoid having to process the whole document for performance reasons. Does anyone have any suggestions or am I missing something obvious?

Replies are listed 'Best First'.
Re: XML::Twig Stream root children
by toolic (Bishop) on Oct 02, 2009 at 20:24 UTC

      xml_split does not create valid xml documents. It is appears mainly to allow you break apart the document and then put it back together with xml_merge.

      I guess I could modify the resulting document after running it. If I don't have any other solutions.

        xml_split does not create valid xml documents.

        Sure it does. Observe

        $$ echo THIS IS THE SAME AS xml_split -v Herkum.xml $$ xml_split -v -c "level(1)" Herkum.xml generating main file Herkum-00.xml generating Herkum-01.xml generating Herkum-02.xml $$ cat Herkum-00.xml && echo <root> <?merge subdocs = 0 :Herkum-01.xml?> <?merge subdocs = 0 :Herkum-02.xml?> </root> $$ cat Herkum-01.xml && echo <a> <b>Test</b> </a> $$ cat Herkum-02.xml && echo <aa> <b>Test</b> </aa>
        You can use "level(1)" with twig_handlers

        Would you care to explain? I tried to make it output well-formed fragments. About the only thing I can think of that would trip xml_split would be DTDs with entities, and if you have an example, I would be glad to make it work for you.

        Thanks

Re: XML::Twig Stream root children
by Skeeve (Parson) on Oct 03, 2009 at 15:46 UTC

    I currently don't have XML::Twig installed, but shouldn't a generic handler like

    '/root/*' => \&process_child,

    do? Simply create in "process_child" a new root element, cut the current element and insert it into your new root. Save the new root and that should be it.


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e