in reply to Extracting raw text from a website
XML::LibXML version-
use strict; use warnings; use XML::LibXML; use LWP::Simple; my $url = "http://www.perlmonks.org"; my $html = get($url); my $parser = XML::LibXML->new; $parser->recover_silently(1); $parser->keep_blanks(1); my $doc = $parser->parse_html_string($html); print $doc->textContent;
See also: Re: Strip HTML tags again and the follow-up.
|
|---|