in reply to Removing HTML Tags from a file
This would do:
If you look at the perldoc man page for HTML::TokeParser::Simple (and the "less simple" classes it is derived from), you might find it easy to come up with other more useful variants, and/or figure out handy ways to deal with things like scripting and comments that are often included in html files.#!/usr/bin/perl use strict; use HTML::TokeParser::Simple; my $htm = HTML::TokeParser::Simple->new( $ARGV[0] ) or die "oops: $!"; while ( my $token = $htm->get_token ) { if ( $token->is_tag() ) { print " " x length( $token->as_is ); } else { print $token->as_is; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Removing HTML Tags from a file
by agynr (Acolyte) on Dec 14, 2004 at 06:50 UTC | |
by DaWolf (Curate) on Dec 14, 2004 at 06:56 UTC | |
by agynr (Acolyte) on Dec 14, 2004 at 07:08 UTC | |
by prasadbabu (Prior) on Dec 14, 2004 at 07:06 UTC | |
by Anonymous Monk on Dec 15, 2004 at 02:25 UTC |