Re: Strip HTML tags

in reply to Strip HTML tags

While I generally agree that HTML::Parser is a pain (the great flexibility leads to great complexity), for something like this, HTML::TreeBuilder is just the ticket. Three simple lines

my $tree = HTML::TreeBuilder->new;
$tree->parse_file('foo.html');
$non_html = $tree->as_text();
[download]

Should do the trick. This quarter's Perl Journal has a good article on it (the included docs need work)

Comment on Re: Strip HTML tags Download Code

Replies are listed 'Best First'.
Re: Re: Strip HTML tags by dvergin (Monsignor) on Feb 19, 2004 at 02:01 UTC
Warning: This code strips out <anything> that is surrounded by <angle> <brackets>. It does not limit its action to true <html tags>. ------------------------------------------------------------ "Perl is a mess and that's good because the problem space is also a mess." - Larry Wall	[reply]

In Section Cool Uses for Perl