in reply to LWP and Google
All the other caveats about Google not liking scraping still applies. Please also take a look into Spidering Hacks by Kevin Hemenway and Tara Calishain.$ cat ./langley #!/usr/bin/perl use strict; use WWW::Mechanize; my $browser = WWW::Mechanize->new( autocheck=>1 ); $browser->get('http://www.google.com'); $browser->submit_form( form_name => 'f', fields => { q => 'langley public library', }, ); my @links = $browser->links(); for my $link ( @links ) { my $abs = $link->url_abs; next if $abs =~ m[^http://.+google\.com/]; # Google links next if $abs =~ m[^http://\Q64.233.167.104/]; # Cache print $link->text, "\n\t", $link->url, "\n"; } $ ./langley Fraser Valley Regional Library http://www.fvrl.bc.ca/comm_branch_langleycity.asp Sno-Isle Libraries http://www.sno-isle.org/ Public Visitor's Page for NASA Langley Technical Library http://library.larc.nasa.gov/Public/ Notice for NASA Langley Employees visiting the Technical Library ... http://library.larc.nasa.gov/Public/nasalangley.htm Canadian library Web sites and catalogues by region: British ... http://www.collectionscanada.ca/gateway/s22-221-e.html LANGLEY PUBLIC LIBRARY in LANGLEY, Oklahoma Library Data / Profile http://www.librarybug.org/library-OK0111.html 1st Services Squadron - Langley Air Force Base, Virginia http://www.langley.af.mil/1msg/1svs/Library.shtml Public Libraries, Oklahoma (Books) http://www.ohwy.com/ok/l/library.htm Library - Langley High School http://www.fcps.k12.va.us/LangleyHS/library/ Kings Langley Public School Library http://members.ozemail.com.au/~stewil/fivew.html
xoxo,
Andy
|
|---|