in reply to HTML Tag Remover

Why don't you take a look at HTML::FormatText. It'll strip the HTML formatting, plus it will format the text as it would be formatted in HTML. Pretty nice. From the docs:
require HTML::TreeBuilder; $tree = HTML::TreeBuilder->new->parse_file("test.html"); require HTML::FormatText; $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50); print $formatter->format($tree);
If you don't want the formatting bits, you might try using HTML::Parser.

In short, parsing HTML is a tricky thing, and it's best to make use of the already-existing code that was written for this purpose.