uzzikie has asked for the wisdom of the Perl Monks concerning the following question:

hi fellow monks,

i have another XML related question....
what would would you use to check for XML well-formedness?
well, a search in CPAN returned a few modules like XML::Checker which does a fine job of checking validity. But i don't need to check for validity because i do not have a DTD. i just need to make sure it's a proper XML document, i.e. well-formed.

currently, i'm using XML::Simple and it dies when XML documents that are not well-formed are fed into XMLin(), resulting in a ugly message like
no element found at line 1, column 219, byte 219 at /usr/lib/perl5/site_perl/5.6.0/i386-linux/XML/Parser.pm line 185

any more elegant solutions anyone?

Replies are listed 'Best First'.
Re: Checking XML well-formedness
by Cabrion (Friar) on Feb 26, 2003 at 03:45 UTC
    You could wrap an eval around that XMLin() function. Looks like the error message gives explicit instructions on where to find the problem.
Re: Checking XML well-formedness
by Jaap (Curate) on Feb 26, 2003 at 07:21 UTC
    It is indeed horrible that XML::Simple dies on you on bad XML (a module should never die).
    Wrapping it in an eval() works, but is ugly and takes quite some execution time.
    A quick search on google gave me this: perl XML well-formednes checker
      It is indeed horrible that XML::Simple dies on you on bad XML (a module should never die).

      Without wanting to sound defensive :-) ... Calling 'die' is Perl's mechanism for throwing an exception. Using 'eval' to catch exceptions should be part of your everyday programming style. If that's not how you're handling errors in say DBI then you need to read up on what you're missing.

      Please don't perpetuate the myth that 'eval' is costly. If you eval a string then it is true that Perl's compiler must be reinvoked to parse the string - that cost may be significant in some circumstances. However if you use eval to wrap a code block then the code in that block is compiled during the initial parse and the overhead imposed by eval is comparable to the overhead of calling a subroutine.

      In general, a module should not call 'die' though. In general, it should 'use Carp' and 'croak' on errors. This will give the user of the module a better indication of where in their code the problem is being triggered.

      Here's a Perl one-liner I use for checking well-formedness from the command line:

      perl -MXML::Parser -e "XML::Parser->new( ErrorContext => 3 )->parsefile(shift)" filename