in reply to Strip HTML tags

While I generally agree that HTML::Parser is a pain (the great flexibility leads to great complexity), for something like this, HTML::TreeBuilder is just the ticket. Three simple lines
my $tree = HTML::TreeBuilder->new; $tree->parse_file('foo.html'); $non_html = $tree->as_text();
Should do the trick. This quarter's Perl Journal has a good article on it (the included docs need work)

Replies are listed 'Best First'.
Re: Re: Strip HTML tags
by dvergin (Monsignor) on Feb 19, 2004 at 02:01 UTC
    Warning: This code strips out <anything> that is surrounded by <angle> <brackets>. It does not limit its action to true <html tags>.

    ------------------------------------------------------------
    "Perl is a mess and that's good because the
    problem space is also a mess.
    " - Larry Wall