Paulux has asked for the wisdom of the Perl Monks concerning the following question:

Hi to everyone, for an XML parser, I used this code:
#!/usr/bin/perl use XML::Parser; @files = <$plrepository/*.xml>; foreach $xmlfile (@files) { #something is omitted $p2 = new XML::Parser(Handlers => {Start => \&handle_start, End => \&handle_end, Char => \&handle_char}); $p2->parsefile($xmlfile); } sub handle_start { my ($pkg,$element,%attr) = @_; $current_element = $element; if ( $element =~ /Header/i ) { $Number=$attr{Number}; open (OUT, ">$outputfile") or die "No file"; } #something is omitted } sub handle_end { my ($pkg,$element,%attr) = @_; if ( $element =~ /Header/i ) { print OUT $Number,"$separator\n"; print "\tNumber ". $Number . "\n"; close (OUT); } #something is omitted } sub handle_char { my $text = $_[1]; if ( $current_element =~ /^Number$/i ) { ($text !~ /^\s*$/) && ($Number = $text); } #something is omitted }
on more or less 29000 XML files. Sometimes happens that the text between the tags are truncated (i.e. -912 instead of 120A33-912). How can it be possible? there's a limitation on XML::Parser? Has someone had the same problem? B/R

Replies are listed 'Best First'.
Re: problem with XML::Parser
by mirod (Canon) on Jun 18, 2009 at 13:22 UTC

    Yes, everyone has the same problem. It's documented: A single non-markup sequence of characters may generate multiple calls to this handler (from the doc of the Char handler).

    You need to buffer the text in the Char handler, and use it when you get to the next tag. See XML::Parser for more info.

      I'm a perl newbie, how i can buffer it? Any suggestion? B/R

        Err... did you follow the link I gave? The part entitled "Getting all the character data" describes buffering. Note that if you don't have to deal with mixed content, then you don't need to process characters within the Start handler.

        BTW, if you have just started development, I would second Jenda's advice and use (his) XML::Rules or (my!) XML::Twig, or if libxml2 is available, XML::LibXML, even though neither of us wrote it ;--)

Re: problem with XML::Parser
by Jenda (Abbot) on Jun 19, 2009 at 01:12 UTC

    As mirod said, you have to buffer the tag content if you use XML::Parser. It's very low-level. You'd better use something a bit higher level. XML::Rules, XML::Twig, ...

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.