how to remove html tags

cs202083 has asked for the wisdom of the Perl Monks concerning the following question:

hi, Monks:
I am trying to use a perl script to simulate a browser.
I did make use of LWP and HTTP::Request and HTTP::Response,
now, I need to remove html tags from the response's content.
one way I have tried is to use lynx dump switch:
like:

...
open (FILE,">tmp.html");
print FILE $response->content;
$text = `lynx -dump tmp.html`;
print $text;
...
[download]

but this method need to use a tmp file.
my question is:
Is there (there certainly are)
any other way to get rid of the tags?
I mean If I don't want use a tmp file, or If
I don't want to use the "lynx".

Comment on how to remove html tags Download Code

Replies are listed 'Best First'.
Re: how to remove html tags by kutsu (Priest) on Mar 16, 2004 at 18:47 UTC
The first results from Super Search for "remove html tags" is Remove HTML tags from document. Many others followed if you want to Super Search yourself. "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce	[reply]
Re: how to remove html tags by saintmike (Vicar) on Mar 16, 2004 at 18:57 UTC
HTML::FormatText does a nice job if the HTML is not too complicated and you'd like some plain text formatting: `use strict; use HTML::FormatText; use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new(); $tree->parse("<H1>hello</H1>"); my $formatter = HTML::FormatText->new(); print $formatter->format($tree);` [download]	[reply] [d/l]
Re: how to remove html tags by amw1 (Friar) on Mar 16, 2004 at 18:43 UTC
I haven't used it at all but you may want to look at HTML-Strip from CPAN. Looks like it will do what you want.	[reply]
Re: how to remove html tags by davido (Cardinal) on Mar 17, 2004 at 07:15 UTC
I haven't seen anyone suggest this one yet, but it seems the logical solution: HTML::Strip. From the POD for that module, you'll see this simple example: `use HTML::Strip; my $hs = HTML::Strip->new(); my $clean_text = $hs->parse( $raw_html ); $hs->eof;` [download] It couldn't be easier when you use the right tool for the job. Dave	[reply] [d/l]
Re: how to remove html tags by cormac (Acolyte) on Mar 16, 2004 at 20:21 UTC
HTML::Parser and it's cronies HTML::TokeParser or HTML::PullParser might be what you're looking for. Specifically, HTML::Parser's ignore_elements() method comes to mind.	[reply]