I found this site The Web Robots Pages useful for sysadmins and would-be Web robot programmers, who somehow don't happen to know what robots.txt is.
Anyone knows how common or uncommon robots.txt is?
Or anyone would like to share any Do's and Dont's about writing a Web robot (or anything that programmatically fetches something for you via the Web)? It seems rather common that many people did not specify the "agent" (whose default value is "libwww-perl/#.##") when using LWP::UserAgent. It may or may not matter, depending on the sites your script or robot is visiting.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Web Robot
by Abigail-II (Bishop) on Jul 16, 2003 at 22:22 UTC | |
by Anonymous Monk on Jul 17, 2003 at 01:58 UTC | |
by schumi (Hermit) on Jul 17, 2003 at 08:53 UTC | |
by Anonymous Monk on Jul 17, 2003 at 09:10 UTC | |
by schumi (Hermit) on Jul 17, 2003 at 09:45 UTC | |
by Jenda (Abbot) on Jul 17, 2003 at 11:23 UTC | |
|
Re: Web Robot Exclusion
by simonm (Vicar) on Jul 17, 2003 at 03:15 UTC |