Mr. Muskrat has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to process the stats for my Seti@home group using XML::Twig.
However, I am getting an error 'not well-formed at line 99, column 22, byte 3613 at C:/Perl/site/lib/XML/Parser.pm line 168'

Here is the groupstats DTD and the XML I am attempting to parse is here.

I am just getting started learning about XML so my code is pretty basic (and based on an example I saw at mirod's XMLTwig site):

#!/usr/bin/perl -w use strict; use XML::Twig; my $count = 0; # total count of members to compare against my $t= XML::Twig->new( twig_handlers => { member => \&member } ); $t->parsefile('kwsn.xml'); $t->flush; # don't forget to flush one last time in the end or anythin +g # after the last </section> tag will not be output print "Total members: $count\n"; sub member { my ($t, $member) = @_; my $name = field($member, 'name'); # get member's name my $results = field($member, 'numresults'); # get member's results print "$name $results\n"; $t->purge(); $count++; } sub field { my ($member, $field) = @_; return $member->first_child($field)->text; }

What can I do to keep this from happening?

Replies are listed 'Best First'.
Re: 'not well-formed' using XML::Twig
by traveler (Parson) on Dec 24, 2002 at 19:10 UTC
    There is an & on line 99 at char 22. That appears to be a problem as & begins an XML "entity". IIRC, you can edit the DTD to make the & legal (sorry, I forget what you have to do -- my mind is more on Christmas than XML...), or you can make it &amp;, or you can change it to "and".

    Update: I guess I should add that the error messages refer to the data and that the parser quits after the first error.

    HTH, --traveler

Re: 'not well-formed' using XML::Twig
by mirod (Canon) on Dec 25, 2002 at 01:28 UTC

    The useful option here is error_context => 1 which will display the line where the error occurs. This indeed shows that there is a & in the data. This is not allowed in XML, you have to replace it by &amp;. There are a certain number of those in the file, along with a < on line 2614 thr needs to be replaced by &lt; and a <<SW>> that needs to become &lt;&lt;SW>> on line 11037.

    Note that you don't need a field sub either, this is already a method, $member->field( 'name') works just fine.

Re: 'not well-formed' using XML::Twig
by Aristotle (Chancellor) on Dec 24, 2002 at 19:04 UTC

    I just downloaded the data and script and got not well-formed (invalid token) at line 1163, column 17, byte 41545. What I find there is a ³ - so it looks as though the problem is insufficient encoding.

    I'm looking at the XML::Twig POD right now to see if I can dredge up anything fitting. Passing keep_encoding => 1 and/or input_filter => 'safe' to new() didn't help.

    Makeshifts last the longest.

Re: 'not well-formed' using XML::Twig
by Mr. Muskrat (Canon) on Dec 27, 2002 at 03:03 UTC

    So much for my idea to use the parseurl method. Oh, well... There's always LWP to download it.

    I don't suppose that any Monks around here have friends at Berkeley who are responsible for the ill-formed XML...

Re: 'not well-formed' using XML::Twig
by Mr. Muskrat (Canon) on Dec 27, 2002 at 17:19 UTC

      Berkeley says that they will fix it. :)

      See my scratchpad for the code I am using right now. /msg Mr. Muskrat if you have questions, comments or suggestions. Thanks!