Re: How to remove HTML tags from text

in reply to How to remove HTML tags from text

Personally I would go with HTML::Parser:

#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser;
                                                                      
+          
my $data='abcd efgh<img src="http://test.com/image.gif">ijklmn';
my $parser = HTML::Parser->new( text_h => [ sub { $_[0]->{_data} .= $_
+[1]; },"self,dtext" ],
                                start_document_h => [ sub { $_[0]->{_d
+ata} = '';}, "self"]);
$parser->parse($data);
                                                                      
+          
print $parser->{_data};
[download]

/J\

Comment on Re: How to remove HTML tags from text Download Code

Replies are listed 'Best First'.
Re^2: How to remove HTML tags from text by holli (Abbot) on Feb 04, 2005 at 13:01 UTC
Alternative using Html::Tokeparser: `use strict; use HTML::TokeParser; # from file my $p = HTML::TokeParser->new("test.html") or die "Can't open: $!"; #from string #my $p = HTML::TokeParser->new(\"text1 <b> text2 </b> text3"); my $t; while (my $token = $p->get_token) { $t .= $token->[1] if $token->[0] eq "T"; } print $t;` [download] holli, regexed monk	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: How to remove HTML tags from text
by holli (Abbot) on Feb 04, 2005 at 13:01 UTC

use strict;
use HTML::TokeParser;

# from file
my $p = HTML::TokeParser->new("test.html") or die "Can't open: $!";
#from string
#my $p = HTML::TokeParser->new(\"text1 <b> text2 </b> text3");
my $t;

while (my $token = $p->get_token)
{
    $t .= $token->[1] if $token->[0] eq "T";
}

print $t;
[download]

holli, regexed monk

[reply]
[d/l]

In Section Seekers of Perl Wisdom