in reply to Re^3: Another problem with XML parser
in thread Another problem with XML parser

Here is the other part of the code that follows the xml file example.
#!/usr/bin/perl use XML::Parser; @files = <$plrepository/*.xml>; foreach $xmlfile (@files) { #something is omitted $p2 = new XML::Parser(Handlers => {Start => \&handle_start, End => \&handle_end, Char => \&handle_char}); $p2->parsefile($xmlfile); } sub handle_start { my ($pkg,$element,%attr) = @_; $current_element = $element; if ( $element =~ /Header/i ) { $Number=$attr{Number}; open (OUT, ">$outputfile") or die "No file"; } elsif ( $element =~ /ContentElement/i ) { $IdNumber=$attr{IdNumber}; $InstanceNumber=$attr{InstanceNumber}; } $Number=''; # I empty the variable for the next Number in the fil +e $IdNumber=''; $InstanceNumber=''; #something is omitted } sub handle_end { my ($pkg,$element,%attr) = @_; if ( $element =~ /Header/i ) { print OUT $Number,"$separator\n"; print "\tNumber ". $Number . "\n"; close (OUT); } elsif ( $element =~ /ContentElement/i ) { print OUT $IdNumber,"$separator\n"; print "\tIDNumber ". $IdNumber . "\n"; print OUT $InstanceNumber,"$separator\n"; print "\tIstanceNumber ". $InstanceNumber . "\n"; + close (OUT); } $Number=''; # I empty the variable for the next Number in the fil +e $IdNumber=''; $InstanceNumber=''; #something is omitted } sub handle_char { my $text = $_[1]; if ( $current_element =~ /^Number$/i ) { ($text !~ /^\s*$/) && ($Number .= $text); #|-> buffer text } if ( $current_element =~ /^IdNumber$/i ) { ($text !~ /^\s*$/) && ($IdNumber .= $text); #|-> buffer text } if ( $current_element =~ /IstanceNumber/i ) { ($text !~ /^\s*$/) && ($IstanceNumber .= $text); #|-> buffer text } #something is omitted }
I have to print out different CSV files like this:
AC_1234, yyyyyyyy-yy,01463010000016 zzzzzzzz-zz,0000000000000000 xxxxxxxx-xx,111111111111111 aaaaaaaaa-aa,222222222222222
but without buffering i have something like this:
AC_1234, yyyyyyyy-yy,01463010000016 zzzzzzzz-zz,0000000000000000 xxx-xx,111111111111111 aaaaaaaaa-aa,22222
and buffering the text, sometimes happen this:
AC_1234, yyyyyyyy-yy,01463010000016 zzzzzzzz-zz,0000000000000000 xxxxxxxx-xx,111111111111111 xxxxxxxx-xxaaaaaaaaa-aa,222222222222222111111111111111
I hope that now something is much more clear. B/R

Replies are listed 'Best First'.
Re^5: Another problem with XML parser
by Your Mother (Archbishop) on Nov 17, 2009 at 05:42 UTC

    This is mildly idiomatic (the grep/map, for example, and there is probably an equally terse but less idiomatic version). I hope it's otherwise serviceable and interesting. XML::LibXML and Text::CSV_XS for more fun and deeper options.

    Aside: nodeName ne '#text' is more readable but nodeType != 3 is a little more portable (older versions call text nodes "text").

    use strict; use warnings; use XML::LibXML; use Text::CSV_XS; my $doc = XML::LibXML->new->parse_fh(\*DATA); my $root = $doc->getDocumentElement; my $csv = Text::CSV_XS->new({ eol => "\n" }); my ( $ip_node ) = $root->findnodes("Header/IpNumber"); my $ip = $ip_node->textContent; open my $out, ">", "$ip.csv" or die "Coulnd't open $ip.csv for writing: $!"; $csv->print( $out, [ $ip, undef ] ); for my $content_element ( $root->findnodes("ContentElement") ) { my @elements = map { $_->textContent } grep { $_->nodeName ne "#text" } $content_element->childNodes; $csv->print( $out, \@elements ); } __DATA__ <someRoot> <Header> <IpNumber>AC_123</IpNumber> </Header> <ContentElement> <IdNumber>xyxyxyxy-yy</IdNumber> <InstanceNumber>001463010000016</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>ceiling-cat</IdNumber> <InstanceNumber>77777777777</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>basement-cat</IdNumber> <InstanceNumber>666666666666666666</InstanceNumber> </ContentElement> </someRoot>
Re^5: Another problem with XML parser
by gmargo (Hermit) on Nov 16, 2009 at 13:44 UTC

    If you were using strict (or my code :) you'd see the difference between $InstanceNumber and $IstanceNumber.

      But in your code in the handlers start and end you don't empty out the variable $Number, is just because you check if you are in the node with the variable $inHeader = 1; ? B/R
Re^5: Another problem with XML parser
by Paulux (Acolyte) on Nov 16, 2009 at 11:54 UTC
    I'm sorry, but the anonymous monker it's me. I didn't log in