Is it only me that has this?
I tried to get it all working, from the example with my $xml = HTML::HTML5::Parser->load_html... but ofcourse my test website had to come back with an error. my $xml = HTML::HTML5::Parser->load_html... doesn't handle options I figured and had to use $parser->parse_html_file($URL,{'ignore_http_response_code => 1}). However, ofcourse this happens to me... the user_agent was not accepted and returned a HTTP-406 error
after tweaking around for a few hours, I managed to get it working
use HTML::HTML5::Parser; my $user_agent; $user_agent = LWP::UserAgent->new; $user_agent("HTML::HTML5::Parser/".'0.110'." "); $user_agent->parse_head(0); my $parser = HTML::HTML5::Parser->new; my $xml = $parser->parse_html_file($URL, { ignore_http_response_code => 1, user_agent => $user_agent, } ); my $nodes = $xml->findnodes('//*[local-name()="title"]'); say $nodes->get_node(1)->textContent;
I'm proud I did it, but I don't like it to remove some sort of security checks from the LWP::UserAgent, but somehow, it was nescecary for this website
Question: does it conflict with a HTTP-301 - moved permanently status?
In reply to Re^2: extracting data from HTML
by Jurassic Monk
in thread extracting data from HTML
by Jurassic Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |