sugarkannan has asked for the wisdom of the Perl Monks concerning the following question:

how to eliminate all html tags in a given string
For example $string= " <html><head> i am sugar <br> smartest guy in the world <br> </head></html>"
In this i need to get the output as

i am sugar smartest guy in the world

Replies are listed 'Best First'.
Re: how to eliminate all html tags in a given string ??
by GrandFather (Saint) on Nov 24, 2005 at 03:48 UTC

    Take a look at HTML::TreeBuilder and the element method as_text.

    use warnings; use strict; use HTML::TreeBuilder; my $string= " <html><head> i am sugar <br> smartest guy in the world < +br> </head></html>"; my $tree = HTML::TreeBuilder->new; $tree->parse ($string); print $tree->as_text ();

    Prints:

    i am sugar smartest guy in the world

    DWIM is Perl's answer to Gödel
      print HTML::TreeBuilder->new_from_content( $string )->as_text;
Re: how to eliminate all html tags in a given string ??
by Trix606 (Monk) on Nov 24, 2005 at 03:13 UTC
    Check CPAN for a bunch of modules that can help you out. HTML::Parser should do what you need.
Re: how to eliminate all html tags in a given string ??
by Aristotle (Chancellor) on Nov 24, 2005 at 07:25 UTC
    use HTML::TokeParser::Simple; my $text; my $parser = HTML::TokeParser::Simple->new( \$string ); while( my $token = $parser->get_token ) { $text .= $token->as_is if $token->is_text; }

    Makeshifts last the longest.

Re: how to eliminate all html tags in a given string ??
by EvanCarroll (Chaplain) on Nov 24, 2005 at 09:24 UTC
    Suggestions seem off, though they all work, I would highly suggest you use HTML::Strip.


    Evan Carroll
    www.EvanCarroll.com