in reply to Web Scraper for search site
Personally I'd do something like this:
use URI::QueryParam; use Web::Magic 0.008; use XML::LibXML 1.94; Web::Magic -> new('http://www.google.co.uk/search', q => 'kittens') -> assert_success -> assert_content_type('text/html') -> make_absolute_urls -> findnodes('//*[@class="r"]/*[local-name()="a"]') -> foreach(sub{ my $google_munged_url = URI->new($_->{href}); my $fixed_url = $google_munged_url->query_param('sa') eq 'U' ? $google_munged_url->query_param('q') : $google_munged_url; printf "%s <%s>\n", $_->textContent, $fixed_url; });
Obviously you need to make sure that whatever scraping you're doing is allowed by the search engine's terms of service.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Web Scraper for search site
by mailmeakhila (Sexton) on Mar 16, 2012 at 20:10 UTC |