in reply to Stripping the contents of Javascript tags

And what about a similar construct for stripping comments? I looked at HTML::TagFilter, but it wasn't so good at stripping Javascript tags, but the regex sauoq and Cody Pendant came up with both worked. Another weird thing is this Microsoft garbage:
<!--[if IE]><script language=javascript>ie5=1;</script><![endif]-->

Using the regexes suggested, the <script>..</script> tags are removed, but the Microsoft garbage remains. Is there a way to roll two more regexes that can strip "normal" comments, and additionally strip this Microsoft comment garbage as well? (prominently found in Yahoo's main page)

Thanks for the help.

Replies are listed 'Best First'.
Re: Re: Stripping the contents of Javascript tags
by Lachesis (Friar) on May 28, 2003 at 08:28 UTC
    A comment stripper should pick up on the MS garbage as well as normal comments
    Personally I'd go for using HTML::Parser to reconstruct the file - a quick and dirty version would be something like:
    use HTML::Parser; my $parser = HTML::Parser->new( api_version => 3, default_h => [sub { print $_[0] unless lc $_[1] eq 'script' }, 'te +xt,tagname'], comment_h=> [""], ); $parser->parse_file($file);
    This won't handle any javascript event handlers, for that you would need to go through the attributes of each tag and remove the event handlers.
    You would also need to check all your hrefs for javascript as well
    Update: The comment stripping will work but the default handler I wrote won't remove the script properly.