HTML::TokeParser::Simple to the rescue. You said you wanted to remove everything "between" the tags, so I'm leaving the tags in. This should be relatively easy to fix if you want to also strip the script tags.
#!/usr/bin/perl -w use strict; use HTML::TokeParser::Simple 1.4; my $parser = HTML::TokeParser::Simple->new( *DATA ); my $html = ''; my $is_script = 0; while ( my $token = $parser->get_token ) { $html .= $token->as_is unless $is_script; if ( $token->is_start_tag('script') ) { $is_script = 1; } elsif ( $token->is_end_tag('script') ) { $is_script = 0; $html .= $token->as_is; } } print $html; __DATA__ <title>foobar</title> <script language="Javascript"> foo bar foo bar </script> You fail if you remove this line! <script language="Javascript"> bar foo bar foo </script>
Cheers,
Ovid
New address of my CGI Course.
Silence is Evil (feel free to copy and distribute widely - note copyright text)
In reply to Re: Removing Javascript
by Ovid
in thread Removing Javascript
by Mur
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |