Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I am trying to download pdf file from the below site

http://www.sgx.com/wps/portal/marketplace/mp-en/listed_companies_info/ipos/ipo_prospectus
Click on the Company link from the page, will popup a page, choose Propectus tab and download the link.
If I view the HTML source, I am not able to see the pdf link
Can anyone please help me to download the pdf from this html page using perl?

Thanks

Replies are listed 'Best First'.
Re: web download help
by ww (Archbishop) on Nov 19, 2010 at 21:40 UTC
    When you hear the word "popup" in relation to rendering by a browser what often-used scripting language jumps to mind?

    Did I hear "javascript" ... or maybe "ECMAScript" ?"

    So perhaps the next thing your should do is see if you see some js in the source .html... and if you do, use Super Search to see the many closely related questions and answers you find here.

    That's one aspect of the self-help the Monastery's norms expect from a SOPW. Another norm, well covered in On asking for help and How do I post a question effectively? is that we expect to see (enought of) your code and data (both formatted inside <c>...</c>) that we need not use our psi, crystal balls or mind-reading techniques to help you.

Re: web download help
by chrestomanci (Priest) on Nov 19, 2010 at 21:04 UTC

    I assume that you don't want just one PDF from that site, you want to download lots of them. In that case, I would in the first instance try to find out if there is another way of getting the information. Could you buy a CD-Rom for example?

    Also, how many of these documents do you plan to read? Could you just download them as you need them? The site is not behind a pay wall, so there is nothing stopping you downloading the files as you need them, and sharing links to them with anyone else who needs them.

    Assuming you can't get the PDF files via another route, then the way I see it, there are two approaches you could use to download the files:

    You could use a GUI HTML tree inspector such as Firebug to dissect and understand the structure of those pages, and then use HTML::TreeBuilder to pull them apart to extract the links to the files you need and to download them. (I made similar suggestions in answer to another similar question. For that site you might also need to learn a bit of JavaScript to understand how the links are generated.

    Alternatively, you could start looking at the download links for some of the files you want, and look for patterns in the coded parts, and try to spot those patterns in the index page. From that you can can write a script that will give you the download links for everything referenced from the front page.

    When I was a younger and more sinful monk (before I entered this monastery), I used the second technique in perl scripts I wrote to download images from pr0n web sites.