in reply to Re: extracting data from HTML
in thread extracting data from HTML
Is it only me that has this?
I tried to get it all working, from the example with my $xml = HTML::HTML5::Parser->load_html... but ofcourse my test website had to come back with an error. my $xml = HTML::HTML5::Parser->load_html... doesn't handle options I figured and had to use $parser->parse_html_file($URL,{'ignore_http_response_code => 1}). However, ofcourse this happens to me... the user_agent was not accepted and returned a HTTP-406 error
after tweaking around for a few hours, I managed to get it working
use HTML::HTML5::Parser; my $user_agent; $user_agent = LWP::UserAgent->new; $user_agent("HTML::HTML5::Parser/".'0.110'." "); $user_agent->parse_head(0); my $parser = HTML::HTML5::Parser->new; my $xml = $parser->parse_html_file($URL, { ignore_http_response_code => 1, user_agent => $user_agent, } ); my $nodes = $xml->findnodes('//*[local-name()="title"]'); say $nodes->get_node(1)->textContent;
I'm proud I did it, but I don't like it to remove some sort of security checks from the LWP::UserAgent, but somehow, it was nescecary for this website
Question: does it conflict with a HTTP-301 - moved permanently status?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: extracting data from HTML
by bitingduck (Deacon) on Jun 04, 2012 at 04:24 UTC | |
by Jurassic Monk (Acolyte) on Jun 04, 2012 at 18:01 UTC |