Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: How to remove HTML tags from text

by gellyfish (Monsignor)
on Feb 04, 2005 at 12:22 UTC ( #428033=note: print w/replies, xml ) Need Help??

in reply to How to remove HTML tags from text

Personally I would go with HTML::Parser:

#!/usr/bin/perl use strict; use warnings; use HTML::Parser; + my $data='abcd efgh<img src="">ijklmn'; my $parser = HTML::Parser->new( text_h => [ sub { $_[0]->{_data} .= $_ +[1]; },"self,dtext" ], start_document_h => [ sub { $_[0]->{_d +ata} = '';}, "self"]); $parser->parse($data); + print $parser->{_data};


Replies are listed 'Best First'.
Re^2: How to remove HTML tags from text
by holli (Abbot) on Feb 04, 2005 at 13:01 UTC
    Alternative using Html::Tokeparser:
    use strict; use HTML::TokeParser; # from file my $p = HTML::TokeParser->new("test.html") or die "Can't open: $!"; #from string #my $p = HTML::TokeParser->new(\"text1 <b> text2 </b> text3"); my $t; while (my $token = $p->get_token) { $t .= $token->[1] if $token->[0] eq "T"; } print $t;

    holli, regexed monk

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://428033]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2023-03-21 04:23 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (59 votes). Check out past polls.