Here's an HTML::TokeParser solution. The output is kind of messy, but it works.
#!/usr/bin/perl -w use strict; use HTML::TokeParser; my $filename = $ARGV[0] or die 'not enough arguments'; my $parser = new HTML::TokeParser ($filename); while (my $token = $parser->get_token()) { my ($type, $tag) = ($token->[0], $token->[1]); # We don't want <layer> or <iframe> tags next if $tag eq "layer" || $tag eq "iframe"; # We can stop reading when we hit the nodelets section last if $type eq "C" && $tag eq "<!-- nodelets start here -->" +; # Print the token's text. All the token types except T # have their text as their last element. How annoying. if ($type eq "T") { print $tag; } else { print $token->[$#{$token}]; } } # Add a closing </table>. Netscape won't display a table if the tags +aren't # balanced. print "</table>\n"; # EOF
-Matt
In reply to RE: Stripping tags from a PerlMonks page.
by DrManhattan
in thread Stripping tags from a PerlMonks page.
by dmtelf
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |