in reply to Parsing/Extracting Data from HTML.

Perl can covert HTML to text too...

$htmltext =~ s/<(.*)>//g;

...will replace all tags with emptiness.

If you wish to convert br's and p's to newlines before they are stripped, add:
$htmltext =~ s/<(br|p)>/\n\n/ig;
before the first command.

Of course, you'll lose all formatting. This method is not quarenteed to properly strip comments.

Replies are listed 'Best First'.
RE: Re: Parsing/Extracting Data from HTML.
by chromatic (Archbishop) on Mar 23, 2000 at 20:48 UTC
    No, don't do that. It's too greedy:
    my $string = "<first><second>blahblah<third>\n"; $string =~ s/<(.*)>//g; print $string;
    Result: (Hey, it's blank!)

    If you really want to do it this way, use: $string =~ s/<[^>]*?>//g; The question mark keeps the asterisk from slurping up any character -- including angle brackets -- to the end of the line, and then backtracking to pick up that last angle bracket. Of course, so does the negated character class. Just be more specific.