#With $_ holding the HTML text... #Pull comments. Note that # `<!-- foo="--> bar <--" -->' will NOT strip ` bar '. # I claim this to be a feature. s/<!--.*?-->//g; #for comments like <blah blah="blah" blah='blah' ... >, # strip from after the start of the tag up to the end # of the first quoted string, repeatedly, ending in either # `<>' or `<no quotes here>' # Update: Now handles either quote char, with the other # possibly within the quoted string. while ( s/<(?!--)[^'">]*"[^"]*"/</g or s/<(?!--)[^'">]*'[^']*'/</g) {}; #strip HTML tags without quotes in them... which should be # the only kind that we have left. s/<(?!--)[^">]*>//g; print $_;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Strip HTML tags
by davorg (Chancellor) on Dec 15, 2000 at 13:49 UTC | |
by rlk (Pilgrim) on Dec 15, 2000 at 23:03 UTC | |
|
Re: Strip HTML tags
by swiftone (Curate) on Dec 16, 2000 at 04:20 UTC | |
by dvergin (Monsignor) on Feb 19, 2004 at 02:01 UTC | |
|
Re: Strip HTML tags
by chipmunk (Parson) on Dec 16, 2000 at 00:13 UTC | |
by rlk (Pilgrim) on Dec 16, 2000 at 03:25 UTC | |
|
Re: Strip HTML tags
by epoptai (Curate) on Dec 16, 2000 at 06:41 UTC |