Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear monks,

I need some help in order to download a file. I am not sure if it is possible at all.

Manually in order to see the page I mean.

site: http://cs3-hq.oecd.org/scripts/hpv/
search
by CAS Number : 50000
click link

Now it should open a page containing a link with a pdf called 50000.pdf. This is what I want to download. I never see the pdf as content of the $mech. And the link is named "#". Why? Could you please point me to my mistake.

Thank you very much.

my $oecd = URI->new('http://cs3-hq.oecd.org/scripts/hpv/DetailProd +uit_Contenu.asp?CASNUM='."$_"); $mech->get($oecd); #$mech->follow_link(url_regex => qr/\.pdf/); #my $url = $mech->find_link(text_regex => qr/\.pdf/); my @links = $mech->find_all_links(); for(@links) { print $_->url(), "\n"; print $_->text(), "\n"; }

Replies are listed 'Best First'.
Re: Mechnize, Links and Downloads
by almut (Canon) on Aug 16, 2007 at 14:55 UTC

    The link makes use of Javascript:

    <a href="#" onclick="javascript: ouverturePopup('Status/DownloadFile.A +SP?CASNUM=50000&StatusCode=SIARC&DataNo=1'); return false;">50000.pdf +</a>

    In other words, you should be able to retrieve this particular PDF via the URL:

    http://cs3-hq.oecd.org/scripts/hpv/Status/DownloadFile.ASP?CASNUM=5000 +0&StatusCode=SIARC&DataNo=1

    As this doesn't look too complicated to extract, you might get around having to execute the Javascript in this case...

      Thank you very much. This was exactly what I wanted. Forgot about Javascript.
Re: Mechnize, Links and Downloads
by marto (Cardinal) on Aug 16, 2007 at 14:51 UTC
    This sort of question gets asked frequently here. This site uses JavaScript to open a window containing the PDF you are looking for.. WWW::Mechanize does not deal with JavaScript, as stated in the documentation. You could either use Win32::IE::Mechanize or Mozilla::Mechanize to get around this, or even better write some Perl code to work around this problem.

    Hope this helps.

    Martin