More PDF Download

Blue_eyed_son has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone--I am trying to download (not parse, just download) a large number of pdf's. I have been using the mirror function in WWW::Mechanize, and I keep getting the same problem: the link points to a pdf, but when I mirror the link, I get html code telling me To access this document, wait a moment or click on a link that's the SAME address I just tried to mirror! The content type I get after using mirror is text/html. When I click on the link in a browser, the pdf opens up no problem. Does anyone know what kind of referral process or whatever is going on here that it's not giving me the file? Thanks.

Comment on More PDF Download

Replies are listed 'Best First'.
Re: More PDF Download by Joost (Canon) on Sep 20, 2007 at 23:34 UTC
maybe the server expects the referer header to match the address. sounds like a stupid setup to me, but then, I've seen a lot of stupid stuff. in that case, get() the url first, then mirror()ing the same url might work. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^2: More PDF Download by Blue_eyed_son (Sexton) on Sep 21, 2007 at 01:49 UTC
That's it!!! I just had to get() beforehand. Thanks a million!	[reply]
Re: More PDF Download by shmem (Chancellor) on Sep 20, 2007 at 23:29 UTC
Sounds like a JavaScript function in that page which implements a counter and does some URL munging. Have a look at the document "source code". --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply]