XML::LibXML version-
use strict; use warnings; use XML::LibXML; use LWP::Simple; my $url = "http://www.perlmonks.org"; my $html = get($url); my $parser = XML::LibXML->new; $parser->recover_silently(1); $parser->keep_blanks(1); my $doc = $parser->parse_html_string($html); print $doc->textContent;
See also: Re: Strip HTML tags again and the follow-up.
In reply to Re: Extracting raw text from a website
by Your Mother
in thread Extracting raw text from a website
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |