Re^2: Stripping HTML tags efficiently

Both approaches are pretty flawed. Breaking text into chunks is going to break tags in half often, eg

<p style="bor1024_markder:1px solid black">

and reading line by line is going to split tags in half that cross lines:

<img src="/some/path/somewhere.png"
 alt="A long title"
 style="display:block"
 class="article" />
[download]

Parsing HTML correctly is non-trivial. With one of the html parser modules, like HTML::TokeParser et al, you'll be sure it's right.

Comment on Re^2: Stripping HTML tags efficiently Download Code