Do you have multiple elements with a substring of "Header"? You could add anchors to the element match: $element =~ /^Header$/i.
Can you have a "Number" element that resides outside of a "Header" element? That could be why you see double numbers. Try adding a flag so that the "Char" routine only checks for "Number" while inside a "Header".
Can you have nested "Header" elements?
You are opening your output file in one subroutine, and then writing to it and closing it in another. What is the purpose of spliting this up? I would keep the open/write/close together.
And, purely for entertainment purposes, here is my version of your code, with most of the above ideas, reformatted a bit while I was trying to understand it. It compiles but is untested.
#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use XML::Parser; my $plrepository = "."; my @files = <$plrepository/*.xml>; foreach my $xmlfile (@files) { #something is omitted my $p2 = new XML::Parser(Handlers => { Start => \&handle_start, End => \&handle_end, Char => \&handle_char }); $p2->parsefile($xmlfile); } my $current_element; # global, shared with start,char my $Number; # global, shared with start,end,char my $inHeader = 0; # global, shared with start,end,char sub handle_start { my ($pkg,$element,%attr) = @_; $current_element = $element; if ( $element =~ /^Header$/i ) { $Number=$attr{Number}; $inHeader = 1; } } my $separator = ","; my $outputfile = "numbers.txt"; sub handle_end { my ($pkg,$element,%attr) = @_; if ( $element =~ /^Header$/i ) { # Are we overwriting the same file for every Header? open (OUT, ">", $outputfile) or die "No file"; print OUT $Number,"$separator\n"; print "\tNumber ". $Number . "\n"; close (OUT); $inHeader = 0; } } sub handle_char { my ($pkg,$text) = @_; if ( $inHeader && $current_element =~ /^Number$/i && $text !~ /^\s*$/ ) { $Number .= $text; #|-> buffer text } }
In reply to Re: Another problem with XML parser
by gmargo
in thread Another problem with XML parser
by Paulux
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |