in reply to Re: Publish or Polish
in thread Publish or Polish

I've looked at both demoronizer and tidy. Tidy strips out stuff that is usefull (like <span> tags). Demoronizer I glanced at, but decided I didn't gain much using it as a pre-pass over the HTML.

It's easier to use HTML::TreeBuilder to suck in the lot, then pull out the elements that I'm interested in. Mostly works pretty well. I get headings, tables, some character styles (like <code>) and anchors.


Perl is Huffman encoded by design.