in reply to Re: Stripping HTML tags
in thread Stripping HTML tags
This is a nice simple solution, but it really depends on how robust the OP needs the solution to be. If they simply have a bunch of files they want to strip and then hand edit, this is excellent. If it needs to work unsupervised, then HTML::Strip or HTML::Parser might be a bit better.
Two issue that immediately come to mind are - tags nested inside comments wouldn't strip correctly, and things like script or style tags would be poorly handled. Also, DocType declarations are missed. Suggest:
s/<!-- .*? -->//xsg; s/<(script|style)[^>]*> .*? <\/\1[^>]*>//xsg; s/(?: <[^<>]*> )+/ /xsg; # ...
Still not terribly robust, but possibly sufficient.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Stripping HTML tags
by tilly (Archbishop) on May 24, 2005 at 23:49 UTC | |
by fishbot_v2 (Chaplain) on May 25, 2005 at 00:55 UTC | |
by tilly (Archbishop) on May 25, 2005 at 01:01 UTC |