in reply to HTML Tag Remover
If you don't want the formatting bits, you might try using HTML::Parser.require HTML::TreeBuilder; $tree = HTML::TreeBuilder->new->parse_file("test.html"); require HTML::FormatText; $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50); print $formatter->format($tree);
In short, parsing HTML is a tricky thing, and it's best to make use of the already-existing code that was written for this purpose.
|
|---|