in reply to Forcing XML to validate

I have recently encountered this same problem, but the issue for me is a little more esoteric. I believe that yes, it is important for the XML parser to quit if there is malformed XML data, but the problem for me is with the execution completely stopping altogether.

If I am running through a list of URL's that are XML data and it happens that one of the XML files is out dated and there is an HTML file in place indicating this change, of course the program is going to die. What happens next? You run your program again and it will die at the same place...so it must ignore any non-XML and/or mal-formed XML files.

For non-XML files the test is easy. You can do something like

if(($xml_data =~ m/<\?xml version/){ .. }

For actual XML integrity itself there needs to be some measure, so that you can skip parsing if it will in fact fail during parse. It would be nice to modify the expat itself so that instead of dying it will return a failed status. This would give more control to the programmer and is stylisticallly better.

Replies are listed 'Best First'.
Re^2: Forcing XML to validate
by grantm (Parson) on Feb 04, 2005 at 00:25 UTC

    There's no need to modify expat to stop your program dying on the first error it encounters - that's what eval is for. This is covered in the Perl-XML FAQ.

    Also, the '<?xml ...' declaration at the top of an XML file is optional and frequently omitted. Don't rely on it being there.