I assume that you don't want just one PDF from that site, you want to download lots of them. In that case, I would in the first instance try to find out if there is another way of getting the information. Could you buy a CD-Rom for example?

Also, how many of these documents do you plan to read? Could you just download them as you need them? The site is not behind a pay wall, so there is nothing stopping you downloading the files as you need them, and sharing links to them with anyone else who needs them.

Assuming you can't get the PDF files via another route, then the way I see it, there are two approaches you could use to download the files:

You could use a GUI HTML tree inspector such as Firebug to dissect and understand the structure of those pages, and then use HTML::TreeBuilder to pull them apart to extract the links to the files you need and to download them. (I made similar suggestions in answer to another similar question. For that site you might also need to learn a bit of JavaScript to understand how the links are generated.

Alternatively, you could start looking at the download links for some of the files you want, and look for patterns in the coded parts, and try to spot those patterns in the index page. From that you can can write a script that will give you the download links for everything referenced from the front page.

When I was a younger and more sinful monk (before I entered this monastery), I used the second technique in perl scripts I wrote to download images from pr0n web sites.


In reply to Re: web download help by chrestomanci
in thread web download help by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.