comment on

Thanks for your input. I've heard of robots.txt files, but I've never dealt with them yet.

Unlike that monster on the White House web site, this one is rather small:

User-agent: *
Disallow: /cgi-bin/
Disallow: /journals/EJDE/Monographs/
Disallow: /journals/EJDE/Volumes/
[download]

From what I read yesterday about robots.txt files, I'm OK, since I'm scraping the results of a search page that resides in a different directory.

But your advice about asking the webmaster about an appropriate delay is well taken, I'll see if I can contact him. I'm sure this is a quite capable server, since it's a service of the European Mathematical Society. Plus there are several mirrors.

But in general though, are you saying that even if I'm accessing high bandwidth servers, I should be using at least a two second delay?

TheEnigma

In reply to Re^4: Ethical issues with screen scraping by TheEnigma
in thread Use WWW::Mechanize to Download Pictures of Sayuri Anzu by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.