in reply to Re: Stripping HTML tags efficiently
in thread Stripping HTML tags efficiently
Both approaches are pretty flawed. Breaking text into chunks is going to break tags in half often, eg
<p style="bor1024_markder:1px solid black">and reading line by line is going to split tags in half that cross lines:
<img src="/some/path/somewhere.png" alt="A long title" style="display:block" class="article" />
Parsing HTML correctly is non-trivial. With one of the html parser modules, like HTML::TokeParser et al, you'll be sure it's right.
|
|---|