Web Scraper for search site

mailmeakhila has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am writing a web scraper for a search site. I am working with WWW::Mechanize and Web::Scraper. I am struck on how to "process" $mech->content to get desired data. I am planning to write "process" as generic to the site as possible. Any suggestions? or is it atleast possible?

Comment on Web Scraper for search site

Replies are listed 'Best First'.
Re: Web Scraper for search site by tobyink (Canon) on Mar 12, 2012 at 21:56 UTC
Personally I'd do something like this: `use URI::QueryParam; use Web::Magic 0.008; use XML::LibXML 1.94; Web::Magic -> new('http://www.google.co.uk/search', q => 'kittens') -> assert_success -> assert_content_type('text/html') -> make_absolute_urls -> findnodes('//[@class="r"]/[local-name()="a"]') -> foreach(sub{ my $google_munged_url = URI->new($_->{href}); my $fixed_url = $google_munged_url->query_param('sa') eq 'U' ? $google_munged_url->query_param('q') : $google_munged_url; printf "%s <%s>\n", $_->textContent, $fixed_url; });` [download] Obviously you need to make sure that whatever scraping you're doing is allowed by the search engine's terms of service. `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re^2: Web Scraper for search site by mailmeakhila (Sexton) on Mar 16, 2012 at 20:10 UTC
Thank you it was helpful	[reply]