in reply to X(ht)ML Source Formatting

I once wrote a batch XML (and XHTML is XML) indenter and got some nice replies for even better tools. See Light batch XML indenter for the scoop.

Replies are listed 'Best First'.
Re: Re: X(ht)ML Source Formatting
by ViceRaid (Chaplain) on Aug 13, 2003 at 20:06 UTC

    ++ thank you. That seems like a nippy way of doing it (compared to creating a HTML parse tree, anyway). It might also help me get over my phobia of the /x regex modifier.

    However, I'm going to be an XHMTL pedant and point that there's a few things it doesn't handle correctly. By correctly, I mean, the end result isn't identical, from an XML parser's point of view, with the start.

    • It needs to leave CDATA sections alone. In XHTML Strict, SCRIPT and STYLE sections may declare the content to be unparsed character data. This is useful because it allows you to have '<' and '>' in your scripts (eg the Javascript comparison operators) and styles (CSS contextual selectors) without having to escape them.
    • It shouldn't touch the whitespace at all within PRE elements; inside these, whitespace should be taken as literally in the file, and not closed up. For example, this will come out wrong:
    <pre> <span id="foo">foo</span> </pre>

    Sorry, I don't want to detract from a really nice piece of work; I can see that it would definitely be useful in more data-oriented XML settings. However it's not really accurate enough for me to use in a production setting.

    cheers
    ViceRaid

      Oh for sure. Definately. I didn't even attempt to "parse" the XML or do anything besides handle the most generic of tasks. I don't think its even valid to talk about CDATA or PRE elements or anything that requires knowledge of actual XML or XHTML. In this case the *only* thing it respects are the '<', '/' and '>' characters. It was one of those things a person writes when its 9pm, you're still at work and you've still got to pack for a trip tomorrow at 7am. An ugly scene all around.

      Pls forgive if I don't understand the problem domain too well, but XML::Twig may suit(?)

        Mmmm, I do like the pretty-printing options in XML::Twig, and lots of other things about the package. I'll take a look at how I can coax the HTML tree into a XML::Twig without outputting and reparsing.

        Thanks
        ViceRaid