in reply to Another problem with XML parser

Do you have multiple elements with a substring of "Header"? You could add anchors to the element match: $element =~ /^Header$/i.

Can you have a "Number" element that resides outside of a "Header" element? That could be why you see double numbers. Try adding a flag so that the "Char" routine only checks for "Number" while inside a "Header".

Can you have nested "Header" elements?

You are opening your output file in one subroutine, and then writing to it and closing it in another. What is the purpose of spliting this up? I would keep the open/write/close together.

And, purely for entertainment purposes, here is my version of your code, with most of the above ideas, reformatted a bit while I was trying to understand it. It compiles but is untested.

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use XML::Parser; my $plrepository = "."; my @files = <$plrepository/*.xml>; foreach my $xmlfile (@files) { #something is omitted my $p2 = new XML::Parser(Handlers => { Start => \&handle_start, End => \&handle_end, Char => \&handle_char }); $p2->parsefile($xmlfile); } my $current_element; # global, shared with start,char my $Number; # global, shared with start,end,char my $inHeader = 0; # global, shared with start,end,char sub handle_start { my ($pkg,$element,%attr) = @_; $current_element = $element; if ( $element =~ /^Header$/i ) { $Number=$attr{Number}; $inHeader = 1; } } my $separator = ","; my $outputfile = "numbers.txt"; sub handle_end { my ($pkg,$element,%attr) = @_; if ( $element =~ /^Header$/i ) { # Are we overwriting the same file for every Header? open (OUT, ">", $outputfile) or die "No file"; print OUT $Number,"$separator\n"; print "\tNumber ". $Number . "\n"; close (OUT); $inHeader = 0; } } sub handle_char { my ($pkg,$text) = @_; if ( $inHeader && $current_element =~ /^Number$/i && $text !~ /^\s*$/ ) { $Number .= $text; #|-> buffer text } }

Replies are listed 'Best First'.
Re^2: Another problem with XML parser
by Paulux (Acolyte) on Nov 11, 2009 at 09:07 UTC
    Here is an example of my xml file (there are over 29000 like this):
    <Header> <IpNumber>AC_1234</IpNumber> </Header> <ContentElement> <IdNumber>yyyyyyyy-yy</IdNumber> <InstanceNumber>001463010000016</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>zzzzzzzz-zz</IdNumber> <InstanceNumber>0000000000000000</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>xxxxxxxx-xx</IdNumber> <InstanceNumber>111111111111111</InstanceNumber> </ContentElement> <ContentElement> <IdNumber>aaaaaaaaa-aa</IdNumber> <InstanceNumber>222222222222222</InstanceNumber> </ContentElement>
    the code i wrote was just a little part, but i have multiple istance of ContentElement and I have to solve the problem on all the tags of xml. But I'll try to modify my code with the your. I have splitted the open/close in write because when i started to implement the code i was a newby (maybe I'm still newby).

      toolic had a really good piece of advice that might have been glossed. XML::Parser is not newbie friendly. XML::Twig or XML::LibXML are likely what you want to work with.

      I'm not sure I followed your example code in your question. Now that you've given some sample data, could you give a description of what desired output/outcome is? You might well get an example solution in Twig and libxml.

        thanx for the privided example, but at the moment I'm tryng to follow the gmargo's one, just because there's another complex logic with the parser that I'm using. But, if i have another problem of cutted datas i'll try to use your. B/R
        Here is the other part of the code that follows the xml file example.