tak_hot has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I am trying to download a PDF file from a web page.
I have used the following code:

use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->agent_alias( 'Windows IE 6' ); $mech->get("https://ecf.nynd.uscourts.gov/doc1/12501815060?pdf_toggle_ +possible=1&de_seq_num=17732 85&caseid=29430&got_receipt=1"); $mech->form_name('GetPass'); $mech->field(login => "xxx"); #Username $mech->field(key => "xxx"); #Password $mech->click(); print $mech->content;
Following is the content printed:
<html><head><title>CM/ECF LIVE - U.S. District Court - NYND</title> <script language="javascript" src="/lib/dls_url.js"></script></head><b +ody BGCOLOR=F9F9F9 TEXT=000000 ><div id="cmecfMainContent"><input typ +e="hidden" id="cmecfMainContentScroll" value="0"><SCRIPT LANGUAGE="Ja +vaScript"> document.cookie="PacerUser=\"li093301258064693 aDlYK1Zk9Vo\"; path=/; +domain=.uscourts.gov;"; if ("PacerPref=receipt=Y; path=/ ; domain=.uscourts.gov".length > 0) { document.cookie="PacerPref=receipt=Y; path=/ ; domain=.uscourts.gov;"; } if ("PacerClient=\"\"; path=/ ; domain=.uscourts.gov".length > 0) { document.cookie="PacerClient=\"\"; path=/ ; domain=.uscourts.gov;"; } if ("ClientDesc=\"\"; path=/ ; domain=.uscourts.gov".length > 0) { document.cookie="ClientDesc=\"\"; path=/ ; domain=.uscourts.gov;"; } if ("https://ecf.nynd.uscourts.gov/doc1/12501815060?pdf_toggle_possibl +e=1&de_seq_num=17732 85&caseid=29430&got_receipt=1".length > 0) { location.assign("https://ecf.nynd.uscourts.gov/doc1/12501815060?pdf_to +ggle_possible=1&de_seq_num=17732 85&caseid=29430&got_receipt=1"); } </SCRIPT><SCRIPT LANGUAGE="JavaScript"> var IsForm = false; var FirstField; function SetFocus() { if(IsForm) { if(FirstField) { var ind = FirstField.indexOf('document.',0); if(ind == 0) { eval(FirstField); } else { var Code = "document.forms[0]."+FirstField+".focus();"; eval(Code); } } else { var Cnt = 0; while(document.forms[0].elements[Cnt] != null) { if(document.forms[0].elements[Cnt].type != "hidden") { document.forms[0].elements[Cnt].focus(); break; } Cnt += 1; } } } return(true); } </SCRIPT> </div></body></html>
However the actual page has a <iframe> tag within which the src attribute of tag has link to pdf.
id="cmecfMainContent"><input type="hidden" id="cmecfMainContentScroll" + value="0"> <iframe src="/cgi-bin/show_temp.pl?file=1086820-0-.pdf&type=applicatio +n/pdf" height="100%" width="100%" frameborder="0" scrolling="no"> <a href="/cgi-bin/show_temp.pl?file=1086820-0-.pdf&type=application/pd +f">click here to view this file</a> </iframe> </body></html>
I have tried using FramesReady and mechanize, but no success. Kindly suggest how to get the pdf link.
Any help would be appreciated.

Thank you

Replies are listed 'Best First'.
Re: Perl iframe problem
by Anonymous Monk on Nov 13, 2009 at 01:58 UTC
    Get something that interprets javascript (Mechanize Firefox), or parse the js yourself
    my( $url ) = /assign("(.+?)")/;
    parsing can be problematic, so a better idea would to figure out what the js does, and re-implement that in perl.
Re: Perl iframe problem
by 7stud (Deacon) on Nov 13, 2009 at 03:32 UTC

    Hi,

    Your problem is that you are searching for something that doesn't exist. The content you printed clearly shows there is no "iframe" string in the text. So you can search that text all you want for "iframe"--using Mechanize or anything else--and you are never going to find it.

    The reason the actual page's text has "iframe" in it is presumably because the browser reads that text you have, and the browser sees some javascript code in there, so the browser executes the javascript code. Then somewhere in the javascript code there are instructions for the browser to add "iframe" to the text, which it does producing a new page of text.

    Therefore, in order to find the "iframe" text you want, you need to give the text you currently have to some kind of program that can read the text, recognize the javascript, execute it, and produce a new page of text according to the instructions in the javascript. A few years ago there were no such programs that could do that because they were too difficult to write. Now apparently there are some.

Re: Perl iframe problem
by tak_hot (Initiate) on Nov 13, 2009 at 18:43 UTC

    This is the complete content I get from $mech->content

    <html><head><title>CM/ECF LIVE - U.S. District Court - NYND</title> <script language="javascript" src="/lib/dls_url.js"></script></head><b +ody BGCOLOR=F9F9F9 TEXT=000000 ><div id="cmecfMainCon tent"><input type="hidden" id="cmecfMainContentScroll" value="0"><SCRI +PT LANGUAGE="JavaScript"> document.cookie="PacerUser=\"l +i093301258137197 Owug. uQP1gU\"; path=/; domain=.uscourts.gov;"; if ("PacerPref=receipt=Y; path +=/ ; domain=.uscourts.gov".length > 0) { document.cookie="Pacer +Pref=receipt=Y; path=/ ; domain=.uscourts.gov;"; } if ("PacerClient=\"\"; path=/ +; domain=.uscourts.gov".length > 0) { document.cookie="Pacer +Client=\"\"; path=/ ; domain=.uscourts.gov;"; } if ("ClientDesc=\"\"; path=/ ; + domain=.uscourts.gov".length > 0) { document.cookie="Clien +tDesc=\"\"; path=/ ; domain=.uscourts.gov;"; } if ("https://ecf.nynd.uscourts +.gov/doc1/12501815060?pdf_toggle_possible=1&de_seq_nu m=1773285&caseid=29430&got_receipt=1".length > 0) { location.assign("https +://ecf.nynd.uscourts.gov/doc1/12501815060?pdf_toggle_ possible=1&de_seq_num=1773285&caseid=29430&got_receipt=1"); } </SCRIPT><SCRIPT LANGUAGE="Jav +aScript"> var IsForm = false; var FirstField; function SetFocus() { if(IsForm) { if(FirstField) { var ind = FirstField.indexOf(' +document.',0); if(ind == 0) { eval(FirstField); } else { var Code = "document.f +orms[0]."+FirstField+".focus();"; eval(Code); } } else { var Cnt = 0; while(document.forms[0].elemen +ts[Cnt] != null) { if(document.forms[0].e +lements[Cnt].type != "hidden") { document.forms +[0].elements[Cnt].focus(); break; } Cnt += 1; } } } return(true); } </SCRIPT> </div></body></html>
    I tried using the debugger in Mozilla but can't get through to the <iframe>.

    Please see if you can get anything from the above content.
    Thank you