in reply to XML tags
XML::Parser and all of its applications will rightfully barf on such a file. However, you may use HTML::Parser in "xml mode" to assist you with the rewrite.
Set up a default handler that just prints the text. Override the start-tag handler to print the text, but push the tag in a stack. Override the end-tag handler to match the end tag to the top of the stack. If they don't match, print an end tag, pop the stack, and repeat until they match. At eof, pop the stack to its completion.
That way, the output will be guaranteed to be properly stacked. It can't handle nested similar tags, but in the absence of a DTD, that's probably the best you can do.
I wrote a Parse::RecDescent tool into which you could feed an SGML-like DTD (with tag minimization), and it would automatically generate the right number of close tags at the right place by brute force. But it was far too slow for any serious work.
-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.
|
|---|