in reply to well formed xml

A fast Perl way to test if XML is well-formed (as opposed to validated) is to use Cooper and Wall's venerable XML::Parser::Expat.

Its documentation states:

setHandlers(TYPE, HANDLER [, TYPE, HANDLER ...])

This method registers handlers for the various events. If no handlers are registered, then a call to parsestring or parsefile will only determine if the corresponding XML document is well formed (by returning without error.) This may be called from within a handler, after the parse has started.
Reviews of expat often state that one of its advantages is that it is fast. (For the record, expat is in C rather than in Perl.)

It is hard to imagine Larry Wall, who wrote version 1.0 of XML::Parser::Expat in order to provide "lowlevel access to James Clark's expat XML parser," bloating his code. Borrowing Davorg's suggested solution, the test would then be:

$parser = new XML::Parser::Expat; $parser->setHandlers(); open(FOO, 'info.xml') or die "Couldn't open"; eval { $parser->parse(*FOO) }; if ($@) { print "$_ is bad\n"; } else { print "$_ is good\n"; } close(FOO);

Replies are listed 'Best First'.
Re: Re: well formed xml
by mirod (Canon) on Feb 28, 2001 at 22:14 UTC

    XML::Parser::Expat is just the lower level interface to Expat used by XML::Parser. It is no more venerable than XML::Parser.

    XML::Parser is an object factory. Every time an XML::Parser object calls its parse or parsefile method, it calls XML::Parser::Expat to create a new parser object. So it does just what you do. And if you don't need any handler there is no need to call setHandlers, there will be no handler set by default.

    As XML::Parser is kinda the official interface to XML::Parser::Expat, and although your code might be marginally faster, I would prefer a slightly improved version of davorg's code, where the creation of the XML object is pulled out of the loop:

    my $p = XML::Parser->new; # needs to be done only once foreach (@list_of_20_000_files) { eval { $p->parsefile($_) }; # creates a new XML::Parser::Exp +at object if ($@) { print "$_ is bad\n"; } else { print "$_ is good\n"; } }