I assume that you don't want just one PDF from that site, you want to download lots of them. In that case, I would in the first instance try to find out if there is another way of getting the information. Could you buy a CD-Rom for example?
Also, how many of these documents do you plan to read? Could you just download them as you need them? The site is not behind a pay wall, so there is nothing stopping you downloading the files as you need them, and sharing links to them with anyone else who needs them.
Assuming you can't get the PDF files via another route, then the way I see it, there are two approaches you could use to download the files:
You could use a GUI HTML tree inspector such as Firebug to dissect and understand the structure of those pages, and then use HTML::TreeBuilder to pull them apart to extract the links to the files you need and to download them. (I made similar suggestions in answer to another similar question. For that site you might also need to learn a bit of JavaScript to understand how the links are generated.
Alternatively, you could start looking at the download links for some of the files you want, and look for patterns in the coded parts, and try to spot those patterns in the index page. From that you can can write a script that will give you the download links for everything referenced from the front page.
When I was a younger and more sinful monk (before I entered this monastery), I used the second technique in perl scripts I wrote to download images from pr0n web sites.
|