Update: Doh! Misconstrued the question to mean remove JavaScript. Leaving the code anyway---it might be useful to someone.
Aristotle helped me with a similar project a while back in another forum. Here's the result. It uses HTML::TokeParser::Simple to do the stripping and works in a very intuitive manner.
#!/usr/bin/perl use strict; use HTML::TokeParser::Simple;
use constant SKIP => 0; use constant COPY => 1; my $new_folder = 'stripped/'; mkdir $new_folder unless -d $new_folder; die "Could not create dir $new_folder: $!" unless -d $new_folder; my $state = COPY; foreach my $doc ( glob("*.html") ) { print "Converting $doc\n"; my $new_file = "$new_folder$doc"; my $p = HTML::TokeParser::Simple->new( $doc ); open OUTFILE, "> $new_file" or die "Cannot open $new_file for writ +ing: $!"; while ( my $token = $p->get_token ) { if ( $token->is_start_tag ('script') ) { $state = SKIP; # stop copying if script } if ($token->is_end_tag ('script') ) { $state = COPY; # start copying again after script } elsif ( $state == SKIP ) { next; } elsif ( $state == COPY ) { print OUTFILE $token->as_is; } } close OUTFILE; }
--
Allolex
In reply to Re: strip JS-comments
by allolex
in thread strip JS-comments
by Skeeve
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |