in reply to speed up download for LWP::Simple

Why are you ignoring their specific request and having a robot engage in exactly the kind of bulk download that they don't want you to do?

It is called netiquitte. If you have the knowledge to write the robot, you should also know when not to. And if you choose to ignore that, expect them to do things like block access from your IP address in self-defence. If you are doing this for an employer, please do some research on what robots.txt files are for and then tell your employer that there is a real risk of being banned from accessing pub med - should you really continue?

Update: I don't mean to imply that you are intentionally breaking the rules. Usually people just never realize that what they are doing is covered by robots.txt, which is why it is important to be proactive when the issue arises.

Replies are listed 'Best First'.
Re: Re: speed up download for LWP::Simple
by dws (Chancellor) on Jul 09, 2003 at 04:10 UTC
    Along these lines, you might find that using WWW::Robot decreases your runtime, by returning only those pages that the site wishes to have spidered.

Re: Re: speed up download for LWP::Simple
by dannoura (Pilgrim) on Jul 09, 2003 at 04:38 UTC

    I read their robots.txt and, although it forbids what I'm doing now, it's done in coordination with one of their directors so it's ok.

      In that case then arranging for access to the local files the website is backed by would be much faster, and would avoid undue load on the publically used webservers.

        OK, thanks, I'll try to arrange that. Any other way?