in reply to Re: Stripping of HTML content
in thread Stripping of HTML content

Hi Molt,

Thanks for the reply but as I stated in my first post I don't really mind that your line of code would leave me with >"> in my output right now - as long as there are no valid tags left that could alter formatting, run scripts etc. - I'm working on learning this from the ground up :)

But the book sounds good - I'll look into it for future reference :)

Thanks!,
Neil

Replies are listed 'Best First'.
Re: Re: Re: Stripping of HTML content
by mp (Deacon) on Sep 12, 2002 at 16:31 UTC
    Depending on how much inaccuracy you can tolerate, you can get a reasonable facsimile of stripping all HTML by doing:
    $page =~ s/<[^<>]*>//g; # Note the added < inside []
    assuming the entire page content is in $page. A line by line approach like that in your original post will fail on tags that span multiple lines. The regexp above will break if you have unbalanced < or > inside of html tags, but may be good enough for your use.