in reply to Re: Another problem with XML parser
in thread Another problem with XML parser

Here is an example of my xml file (there are over 29000 like this):
<Header> <IpNumber>AC_1234</IpNumber> </Header> <ContentElement> <IdNumber>yyyyyyyy-yy</IdNumber> <InstanceNumber>001463010000016</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>zzzzzzzz-zz</IdNumber> <InstanceNumber>0000000000000000</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>xxxxxxxx-xx</IdNumber> <InstanceNumber>111111111111111</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>aaaaaaaaa-aa</IdNumber> <InstanceNumber>222222222222222</InstanceNumber> </ContentElement>
the code i wrote was just a little part, but i have multiple istance of ContentElement and I have to solve the problem on all the tags of xml. But I'll try to modify my code with the your. I have splitted the open/close in write because when i started to implement the code i was a newby (maybe I'm still newby).

Replies are listed 'Best First'.
Re^3: Another problem with XML parser
by Your Mother (Archbishop) on Nov 12, 2009 at 03:42 UTC

    toolic had a really good piece of advice that might have been glossed. XML::Parser is not newbie friendly. XML::Twig or XML::LibXML are likely what you want to work with.

    I'm not sure I followed your example code in your question. Now that you've given some sample data, could you give a description of what desired output/outcome is? You might well get an example solution in Twig and libxml.

      thanx for the privided example, but at the moment I'm tryng to follow the gmargo's one, just because there's another complex logic with the parser that I'm using. But, if i have another problem of cutted datas i'll try to use your. B/R
      Here is the other part of the code that follows the xml file example.

        This is mildly idiomatic (the grep/map, for example, and there is probably an equally terse but less idiomatic version). I hope it's otherwise serviceable and interesting. XML::LibXML and Text::CSV_XS for more fun and deeper options.

        Aside: nodeName ne '#text' is more readable but nodeType != 3 is a little more portable (older versions call text nodes "text").

        use strict; use warnings; use XML::LibXML; use Text::CSV_XS; my $doc = XML::LibXML->new->parse_fh(\*DATA); my $root = $doc->getDocumentElement; my $csv = Text::CSV_XS->new({ eol => "\n" }); my ( $ip_node ) = $root->findnodes("Header/IpNumber"); my $ip = $ip_node->textContent; open my $out, ">", "$ip.csv" or die "Coulnd't open $ip.csv for writing: $!"; $csv->print( $out, [ $ip, undef ] ); for my $content_element ( $root->findnodes("ContentElement") ) { my @elements = map { $_->textContent } grep { $_->nodeName ne "#text" } $content_element->childNodes; $csv->print( $out, \@elements ); } __DATA__ <someRoot> <Header> <IpNumber>AC_123</IpNumber> </Header> <ContentElement> <IdNumber>xyxyxyxy-yy</IdNumber> <InstanceNumber>001463010000016</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>ceiling-cat</IdNumber> <InstanceNumber>77777777777</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>basement-cat</IdNumber> <InstanceNumber>666666666666666666</InstanceNumber> </ContentElement> </someRoot>

        If you were using strict (or my code :) you'd see the difference between $InstanceNumber and $IstanceNumber.

        I'm sorry, but the anonymous monker it's me. I didn't log in