in reply to Re^3: Stripping HTML tags
in thread Stripping HTML tags
Your point is taken, though - I don't say why I still don't think that the ammended solution is robust. Parsing HTML with a series of regexes is slow and difficult. style tags don't necessarily have endtags, for example: They could simply have a link to a .js file. Then, much later in the HTML document, if there was a closing script tag for another block, it would swallow and delete the enclosed valid content.
For performance, HTML::Stripper is an XS module, so it would be much, much faster than the multi-pass regex approach.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Stripping HTML tags
by tilly (Archbishop) on May 25, 2005 at 01:01 UTC |