zacc has asked for the wisdom of the Perl Monks concerning the following question:

Once again, getting some grief with XMLtwig - probably complete ineptness on my part,...
my $infile= XML::Twig->new( twig_roots => { 'MEMBER' => 1, 'ACCOUN +T' => 1 }, twig_handlers => { 'MEMBER' => \&process_member, 'ACCOUNT' + => \&process_account }, keep_encoding => 1, ); $infile->safe_parsefile( $incomingFilename );
The first handler fires, but not the second. Doesn't matter which way round the handlers are listed - if MEMBER is first, then that fires - but ACCOUNT doesn't; if ACCOUNT is first, then that fires - but MEMBER doesn't.

What am I doing wrong ? (And the answer isn't leaving Perl to someone who know what they are doing :-)

OK - it's not me .... phew ... it's the data (I think)
The first ACCOUNT entity has a child element with unicode 216 within "0" with "/"(sorry, don't know the correct letter name). If I remove the line - all is well, with it - the routine barfs. Replacing that character with an "O" means that the line gets processed fine.

Adding <?xml version="1.0" encoding="ISO-8859-1"?> to the head of the XML file solves the problem... but setting $infile->set_encoding('ISO-8559-1') doesn't.

HTH

Replies are listed 'Best First'.
Re: More XML-Twig
by mirod (Canon) on Jan 08, 2008 at 15:38 UTC

    It is hard to tell what's going on without looking at the XML. Is ACCOUNT within MEMBER, or vice-versa? twig_roots would not like that.

    BTW, Do you really need to use twig_roots? It seems that generally, users tend to use it in cases where it is not needed. In your case, you should probably only use it if ACCOUNT and MEMBER make up just a small part of the entire XML flow, and you don't want to load the rest of the document. If your XML is mostly a list of ACCOUNT and MEMBER, then there is no need to use twig_roots. Even if it is appropriate in your case, I hope that others will read this warning and think twice before using twig_roots ;--)

      Sorry, should have been clearer, the XML is in the form...
      <ROOT> <MEMBER>....</MEMBER> <MEMBER>....</MEMBER> <ACCOUNT>...</ACCOUNT> <ACCOUNT>...</ACCOUNT> <ACCOUNT>...</ACCOUNT> </ROOT>
      ie the whole file is (at this time) simply made up of MEMBER and ACCOUNT elements. MEMBER records are processed by one routine, ACCOUNT records are processed by a separate routine.
      (I probably don't need twig_roots at this stage, but I've set it up for future use when the file will contain loads of other elements and then I can use it to reduce the overall load on the system)
      Thanks
Re: More XML-Twig
by Jenda (Abbot) on Jan 09, 2008 at 12:26 UTC

    I tend to have this problem as well. Some clients I need to integrate with keep on sending invalid XML ... the data are in ISO-8859-1, but there's no encoding attribute. This is what I use to "fix" the issue. It's not clean, but what can I do :-(

    unless ($strXML =~ m{^\s*<\?xml [^\?]+?encoding\s*=\s*['"]([^'"]*) +['"][^\?]*\?>} # starts by a <?xml ...?> and uc($1) ne 'UTF-8' # and there is something else than UTF-8 ) { print LOGFILE "It claims to be UTF-8.\n\n"; # so it should be UTF-8 yeah? Let's see ... if (!decode_utf8($strXML)) { print LOGFILE "Hey fix the encoding!!! This aint UTF-8!!!\ +n\n"; $strXML =~ s{^\s*(<\?xml [^\?]+?encoding\s*=\s*['"])[^'"]* +(['"][^\?]*\?>)}{$1ISO-8859-1$2} or $strXML = qq{<?xml version="1" encoding="ISO-8859-1"?>\ +n} . $strXML; } }