matth has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have installed nearly all of the XML modules and have generated an XML document with no closing tags. Is there a bit of software or code that will take a very large XML document and automatically insert the closing tags for me?

Replies are listed 'Best First'.
•Re: XML tags
by merlyn (Sage) on Dec 11, 2002 at 15:28 UTC
    While I agree with the other messages already in this thread that an "XML" document is not really XML if it doesn't have the close tags, I can offer one interesting solution beyond "roll everything yourself".

    XML::Parser and all of its applications will rightfully barf on such a file. However, you may use HTML::Parser in "xml mode" to assist you with the rewrite.

    Set up a default handler that just prints the text. Override the start-tag handler to print the text, but push the tag in a stack. Override the end-tag handler to match the end tag to the top of the stack. If they don't match, print an end tag, pop the stack, and repeat until they match. At eof, pop the stack to its completion.

    That way, the output will be guaranteed to be properly stacked. It can't handle nested similar tags, but in the absence of a DTD, that's probably the best you can do.

    I wrote a Parse::RecDescent tool into which you could feed an SGML-like DTD (with tag minimization), and it would automatically generate the right number of close tags at the right place by brute force. But it was far too slow for any serious work.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: XML tags
by vek (Prior) on Dec 11, 2002 at 13:46 UTC
    You are probably better off going back to the original code that generated the XML in the first place. Fix that code instead of worrying about writing a new program to fix the XML. Once the original code if fixed you can generate the XML once again. This time it will be what you need.

    -- vek --
Re: XML tags
by fruiture (Curate) on Dec 11, 2002 at 14:41 UTC

    Simple: If you don't have closing tags, it's no XML Document. Why should there be any XML Tool that recognizes such a proprietary Non-XML-format? SCNR.

    How should such a Tool know when to put the end-tags? It would need to recognize a DTD, but even that can't make sure the structure is interpeted as intended.

    So if you want to use XML, use XML.

    --
    http://fruiture.de
Re: XML tags
by bronto (Priest) on Dec 11, 2002 at 13:04 UTC

    It depends on many, many things...

    First of all, it would be really better that you generate valid XML documents

    Second: how are the tags structured? How should the program know if it could close a tag?

    Please, post some examples

    Ciao!
    --bronto

    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->make($love) ;
    }

Re: Closing XML Tags
by cjf-II (Monk) on Dec 11, 2002 at 13:26 UTC

    I haven't tried this before, but HTML tidy says it has limited support for XML (although it doesn't recognize XML features such as CDATA sections or DTD subsets.). Might be worth a shot, just create a backup first :).