Do you have multiple elements with a substring of "Header"? You could add anchors to the element match: $element =~ /^Header$/i.

Can you have a "Number" element that resides outside of a "Header" element? That could be why you see double numbers. Try adding a flag so that the "Char" routine only checks for "Number" while inside a "Header".

Can you have nested "Header" elements?

You are opening your output file in one subroutine, and then writing to it and closing it in another. What is the purpose of spliting this up? I would keep the open/write/close together.

And, purely for entertainment purposes, here is my version of your code, with most of the above ideas, reformatted a bit while I was trying to understand it. It compiles but is untested.

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use XML::Parser; my $plrepository = "."; my @files = <$plrepository/*.xml>; foreach my $xmlfile (@files) { #something is omitted my $p2 = new XML::Parser(Handlers => { Start => \&handle_start, End => \&handle_end, Char => \&handle_char }); $p2->parsefile($xmlfile); } my $current_element; # global, shared with start,char my $Number; # global, shared with start,end,char my $inHeader = 0; # global, shared with start,end,char sub handle_start { my ($pkg,$element,%attr) = @_; $current_element = $element; if ( $element =~ /^Header$/i ) { $Number=$attr{Number}; $inHeader = 1; } } my $separator = ","; my $outputfile = "numbers.txt"; sub handle_end { my ($pkg,$element,%attr) = @_; if ( $element =~ /^Header$/i ) { # Are we overwriting the same file for every Header? open (OUT, ">", $outputfile) or die "No file"; print OUT $Number,"$separator\n"; print "\tNumber ". $Number . "\n"; close (OUT); $inHeader = 0; } } sub handle_char { my ($pkg,$text) = @_; if ( $inHeader && $current_element =~ /^Number$/i && $text !~ /^\s*$/ ) { $Number .= $text; #|-> buffer text } }

In reply to Re: Another problem with XML parser by gmargo
in thread Another problem with XML parser by Paulux

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.