Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^4: screen scraping google

by belden (Friar)
on Jul 15, 2004 at 19:30 UTC ( [id://374799]=note: print w/replies, xml ) Need Help??


in reply to Re^3: screen scraping google
in thread screen scraping google

Spiders which obey robots.txt though. All they're asking is that others do the same

# http://www.google.com/robots.txt User-agent: * Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalog_list Disallow: /news Disallow: /pagead/ Disallow: /relpage/ Disallow: /imgres Disallow: /keyword/ Disallow: /u/ Disallow: /univ/ Disallow: /cobrand Disallow: /custom Disallow: /advanced_group_search Disallow: /advanced_search Disallow: /googlesite Disallow: /preferences Disallow: /setprefs Disallow: /swr Disallow: /url Disallow: /wml Disallow: /hws Disallow: /bsd? Disallow: /linux? Disallow: /mac? Disallow: /microsoft? Disallow: /unclesam? Disallow: /answers/search?q= Disallow: /local Disallow: /froogle? Disallow: /froogle_
Belden

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://374799]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2024-04-18 14:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found