xmlsql has asked for the wisdom of the Perl Monks concerning the following question:

I've read the Twig documentation and Jenda's article but as a newbie I'm still lost. Can any one suggest the code for splitting one xml file into multiple xml files based on a particular element with the name of each new file being the datetime the file was created. I do know how to generate the datetime stamp

One new file needs be created for each <Message> element from the original xml file. Each message section can have different child elements so I need to grab everything between <Message> ... </Message> and write it to a new file. The original xml file has the format :

<?xml version="1.0" encoding="utf-8"?>
<TheRoot>
 <Message>
  <MyNumber>001</MyNumber>
  <Registration>222</Registration>
  <GPS>
   <Year>2009</Year>
  </GPS>
  <LbtSession>
   <Year>2009</Year>
  </LbtSession>
 </Message>
 <Message>
  <MyNumber>887</MyNumber>
  <Registration>333</Registration>
  <Client>None</Client>
  <Type>Position</Type>
 </Message>
 <Message>
 etc...
 </Message>
 <Message>
 etc ...
 </Message>
</TheRoot>

Thank you in advance.

Replies are listed 'Best First'.
Re: Splitting xml file into multiple files
by derby (Abbot) on Sep 14, 2009 at 13:01 UTC

    xml_split has all ready been given and is probably the best choice for simple splitting; however, if you're looking to upgrade your XML toolkit, then XML::LibXML is a good choice and something along the lines of below should get you started.

    #!/usr/bin/perl use XML::LibXML; my $file = shift || die "usage $0 <xmlfile>"; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file( $file ); my $nodes = $doc->findnodes( '/TheRoot/Message' ); foreach my $node ( @$nodes ) { print $node->toString(); }
    -derby
Re: Splitting xml file into multiple files
by Anonymous Monk on Sep 14, 2009 at 12:48 UTC

      The reason why this was no good and the documentation was lost on me was that it split the large file into individual files but the values of each child element were blank even though in the original large file they were valued.

        Are you seeing an error message when calling xml_split? The &nbspetc actually makes the example XML invalid. You may have non-well-formed XML that is causing the tools to barf.

        -derby
        That means you only tried the first or second example, its the third one you want, as I showed you.

      I did read the documentation you pointed on re the comment on xml split prior to posting my question. The documentation was of no help at this point. That was the reason for posting. Can anyone suggest what the code should be?

      Thank you inadvance.

        I can understand you being unable to follow the documentation, it is cryptic. However, you should at least try to use it even if you don't understand it. Then you can see what you can learn from it.

        Programming is not all about technical skill and almost nothing runs perfectly on the first go. Sometimes you have to try something just to see what will happen.

        The code should be xml_split.
        $ cat f.xml <?xml version="1.0" encoding="utf-8"?> <TheRoot> <Message> <MyNumber>001</MyNumber> <Registration>222</Registration> <GPS> <Year>2009</Year> </GPS> <LbtSession> <Year>2009</Year> </LbtSession> </Message> <Message> <MyNumber>887</MyNumber> <Registration>333</Registration> <Client>None</Client> <Type>Position</Type> </Message> </TheRoot> $ $ xml_split -vc Message f.xml generating main file f-00.xml generating f-01.xml generating f-02.xml $ cat f-00.xml <?xml version="1.0" encoding="utf-8"?> <TheRoot> <?merge subdocs = 0 :f-01.xml?> <?merge subdocs = 0 :f-02.xml?> </TheRoot> $ $ cat f-01.xml <Message> <MyNumber>001</MyNumber> <Registration>222</Registration> <GPS> <Year>2009</Year> </GPS> <LbtSession> <Year>2009</Year> </LbtSession> </Message> $ $ $ cat f-01.xml <Message> <MyNumber>001</MyNumber> <Registration>222</Registration> <GPS> <Year>2009</Year> </GPS> <LbtSession> <Year>2009</Year> </LbtSession> </Message> $
        This is my first time using xml_split.