in reply to How to remove HTML tags from text

Personally I would go with HTML::Parser:

#!/usr/bin/perl use strict; use warnings; use HTML::Parser; + my $data='abcd efgh<img src="http://test.com/image.gif">ijklmn'; my $parser = HTML::Parser->new( text_h => [ sub { $_[0]->{_data} .= $_ +[1]; },"self,dtext" ], start_document_h => [ sub { $_[0]->{_d +ata} = '';}, "self"]); $parser->parse($data); + print $parser->{_data};

/J\

Replies are listed 'Best First'.
Re^2: How to remove HTML tags from text
by holli (Abbot) on Feb 04, 2005 at 13:01 UTC
    Alternative using Html::Tokeparser:
    use strict; use HTML::TokeParser; # from file my $p = HTML::TokeParser->new("test.html") or die "Can't open: $!"; #from string #my $p = HTML::TokeParser->new(\"text1 <b> text2 </b> text3"); my $t; while (my $token = $p->get_token) { $t .= $token->[1] if $token->[0] eq "T"; } print $t;

    holli, regexed monk