in reply to Re: Publish or Polish
in thread Publish or Polish
I've looked at both demoronizer and tidy. Tidy strips out stuff that is usefull (like <span> tags). Demoronizer I glanced at, but decided I didn't gain much using it as a pre-pass over the HTML.
It's easier to use HTML::TreeBuilder to suck in the lot, then pull out the elements that I'm interested in. Mostly works pretty well. I get headings, tables, some character styles (like <code>) and anchors.
|
|---|