in reply to X(ht)ML Source Formatting

I'm not sure if you want to home-brew something simple (which would give you exactly the results you wanted, rather than having to endlessly tweak someone else's output), but I'll describe how I would approach the problem.

I have, in the past, written a script or two to get some XML/XHTML/HTML formatted more acceptibly. The basic algorithm I use is pretty simple. First, just tokenize the text stream based on tags, and keep a simple counter of what level we're at. An opening tag increments the counter, a closing tag decrements it. Then you print that tag, prefixed by the appropriate amount of indentation. Of course, with plain HTML you must be careful of tags that do not commonly have a closing tag (such as a, p, img), but with well-formed XML/XHTML you do not have to worry as much (other than to watch out for single tags that open and close themselves, like <br/>).

Update: scooped by diotalevi. I guess that's what happens when you have to walk away for 5 minutes to talk to your boss. :) I should note that diotalevi's code pretty much does exactly what I describe here, so maybe my post will still be a useful plain-english explanation? (Grasping at straws here.)

Replies are listed 'Best First'.
Re: Re: X(ht)ML Source Formatting
by ViceRaid (Chaplain) on Aug 13, 2003 at 20:19 UTC

    Thanks - I wouldn't mind home-brewing a tweak to HTML::Element, and I've already got all the bits nicely tokenised for me so I don't have to worry about that. As I mentioned above in reply to diotalevi, it's not quite as simple as we'd wish; on the other hand, you've made me think it's not quite as hard as I'd feared. I guess I might have some trouble justifying spending my work hours scratching my source-code formatting neuroses, but this problem's bitten me now...

    cheers
    ViceRaid