in reply to Remove HTML tags from document

You could use HTML::TokeParser::Simple and only print text tags.

#almost straight from the TokeParser::Simple POD use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( $somefile ); while ( my $token = $p->get_token ) { print $token->as_is if $token->is_text; }

HTH

Replies are listed 'Best First'.
Re: Re: Remove HTML tags from document
by matth (Monk) on Aug 04, 2003 at 09:18 UTC
    This works nicely. Is there an easy adapation that would allow me to maintain spacing that is in the HTML document?

      I'm not sure I understand. I recall that HTML::TokeParser::Simple does in fact maintain newlines in the text. I tested the code quickly just to make sure and it does maintain newlines in the html. Do you have tags that are multi-line? What exactly is happening?

        I have tables where I would like to maintain the tabs.