Ninth Prince has asked for the wisdom of the Perl Monks concerning the following question:

I don't code for a living, but I do code to collect data from the web. Relatively new to PERL.

Problem. I want to fill out a form and then submit it so that I can scrape the resulting screen. My basic issue is that while I can get the relevant data into the form, I can't then get the form submitted. There is JavaScript involved on the page I'm dealing with and I'll have to admit that I don't know much about JS (if anything).

Here is the relevant HTML:

<form name="Search2" method="Post" onSubmit="return JSubmitForm();"> <tr> <td colspan="5"> <table border="0" cellPadding="3" cellSpacing="0" widt +h="100%"> <tbody> <tr> <td width="30%" class="BoldTD" colspan="2">Firm +IARD/CRD Number:</td> <td colspan="3"> <input type="text" name="CrdNumber" value="" s +ize="16" maxlength="12"> <a HREF="JavaScript:JSubmitForm()" onMouseOver +="status='Perform Search'; return true" onMouseOut="status='';"> <img SRC="/IAPD/Images/go_off.gif" alt="Go" +name="go2" onMouseover="JImgAct('go2')" onMouseout="JImgInact('go2')" + BORDER="0" align="top"> </a> </td> </tr> </tbody> </table> </td> </tr> </form>
Okay, here is the code that I have.
#!/usr/bin/perl -w use strict ; use WIN32::IE::Mechanize ; use URI ; my $agent = Win32::IE::Mechanize->new( visible => 1 ); my $url = URI->new( 'http://www.adviserinfo.sec.gov/IAPD/Content/Searc +h/iapd_OrgSearch.aspx' ) ; $agent->get($url) ; $agent->form_name("Search2") ; $agent->field("CrdNumber", "144549") ; $agent->click_button(name => "go2" ) ; my $ie = $agent->follow_link(text => "JavaScript:JSubmitForm()" ) ; exit ;

Originally I thought I had to "click" the "go2" button, but that didn't work. An earlier thread (that I read) suggested to me that what I thought was a button wasn't a button at all, but rather, a link. So, that is why I've also tried following the link. Neither works.

What can I do to get this form submitted? Thanks.

Replies are listed 'Best First'.
Re: WIN32::IE::Mechanize - can't follow link - JavaScript involved
by psini (Deacon) on May 22, 2008 at 16:19 UTC

    You have not posted the js code linked to the page, so there is very little we can do

    In the original HTML page there should be one (or more) <script> tag that contains or javascript code or a url from which the code is downloadable (follow the link with your browser and then "view page source")

    Updated: WIN32::IE::Mechanize has some limits regarding HTML pages containing frames and JS popups. You should check then entire HTML page and the associated JS code

    Are you sure that this couldn't be better done with CGI?

    Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.

      Oops, sorry about that. I don't know enough about JavaScript to even know that I should have given you all or the HTML. Anyway, here it is.

      <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <META content="text/html; charset=windows-1252" http-equiv=Con +tent-Type> <title>Investment Adviser Search</title> <LINK href='/IAPD/Stylesheets/iapd.css' rel=stylesheet type=te +xt/css media='screen'/> <LINK href='/IAPD/Stylesheets/iapd_print.css' rel=stylesheet t +ype=text/css media='print'/> <script type="text/javascript" language='Javascript' src='/IAPD/In +cludes/Validation/Search/iapd_OrgSearch.js'></script> <script type="text/javascript" language='Javascript' src='/Iapd/ +Includes/iapd_WindowManagement.js'></script> <script type="text/javascript" language='Javascript' src='/Iapd/In +cludes/iapd_SetAndSub.js'></script> <script language=javascript> function init() { if(document.Content.Name != null) document.Content.Name.value = ""; if(document.Content.CrdNumber != null) document.Content.CrdNumber.value = ""; if(document.Content.SecNumber != null) document.Content.SecNumber.value = ""; } window.onload = init; </script> </head> <body bgColor=#ffffff leftmargin="0" topmargin="0"> <a NAME="PageTop"></a> <table border="0" cellpadding="0" cellspacing="0" width="100%" + class="noprint"> <tbody> <tr valign="top"> <td height="92" rowSpan="5" nowrap><img alt="SEC Seal" + border="0" height="92" src="/IAPD/Images/SEC_bannerSealTop2.gif" wid +th="95"><img alt="SEC Seal" border="0" height="92" src="/IAPD/Images/ +SEC_bannerSealTopRt.gif" width="26"></td> <td background="/IAPD/Images/SEC_stripedbgMain.gif" he +ight="39" width="171"><img alt="" border="0" height="39" src="/IAPD/I +mages/SEC_bannerFlagMain.gif" width="171"></td> <td align="right" background="/IAPD/Images/SEC_striped +bgMain.gif" class="gray" valign="bottom" width="100%"></td> <td rowspan="4" width="126"> <table border="0" cellpadding="0" cellspacing="0"> <tbody> <tr> <td rowspan="3"><img alt="" height="90" sr +c="/IAPD/Images/flag_lt.gif" width="7"></td> <td><img alt="" height=12 src="/IAPD/Image +s/flag_top.gif" width="112"></td> <td rowspan=3><img alt="" height="90" src= +"/IAPD/Images/flag_rt.gif" width="7"></td> </tr> <tr> <td><img id="_ctl0_imgFlag" src="/IAPD/Ima +ges/State_Regulation.gif" alt="State Regulation" border="0" style="he +ight:72px;width:112px;" /></td> </tr> <tr> <td><img alt="" height="6" src="/IAPD/Imag +es/flag_bot.gif" width="112"></td> </tr> </tbody> </table> </td> </tr> <tr valign="top"> <td bgcolor="black"><img alt="" height="1" src="/IAPD/ +Images/pixel.gif" width="1"></td> <td bgcolor="black" height="34"><img alt="Investment A +dviser Public Disclosure" border="0" height="34" src="/IAPD/Images/IA +PDtitle.gif" width="391"></td> </tr> <tr> <td bgcolor="white" height="2"><img alt="" height="2" +src="/IAPD/Images/pixel.gif"></td> <td bgcolor="white" height="2"><img alt="" height="1" +src="/IAPD/Images/pixel.gif" width="1"></td> </tr> <tr> <td bgcolor="#324395" colspan="2" height="15"><img alt +="" border="0" height="15" src="/IAPD/Images/SEC_bannerFlagBot.gif" w +idth="50"></td> </tr> <tr> <td bgcolor="white" colspan="5" height="2"><img alt="" + height="2" src="/IAPD/Images/pixel.gif"></td> <td bgcolor="white" height="2"></td> </tr> </tbody> </table> <table id="_ctl0_tblMain" border="0" cellpadding="0" height="1 +00%" cellspacing="0" width="100%"> <tr valign="top"> <td bgcolor="#e1e1d6" colspan="3" valign="top" width="1%"> <table border="0" cellpadding="0" cellspacing="0" +width="150" class="noprint"> <tbody> <tr> <td><img src="/IAPD/Images/SEC_SealBotFron +tB.gif" alt="SEC Seal"></td> </tr> <tr> <td><img alt="" border="0" height="1" src= +"/IAPD/Images/pixel.gif" width="5"></td> </tr> </tbody> </table> <table border="0" cellpadding="0" cellspacing="5" +width="130" class="noprint"> <tbody> <tr> <td> <!-- Left Navigation --> <table border="0" width="130" cell +spacing="5" cellpadding="0"> <tr> <td> <img src="/IAPD/images/IAPDhome2.jpg" alt="IAPD Main"> </td> </tr> <tr> <td> <table border="0" width="130" cellspacing="3" cellpadding="0"> <tr> <td> <a onMouseOver="status='Link To the SEC';return true" onMo +useOut="status='';" class="Nav" href="javascript: JSub('/IAPD/Content +/IAPDMain/iapd_SECRedirect.aspx');">Return to the SEC</a> </td> </tr> <tr> <td> <a onMouseOver="status='Link To the SEC IARD Page';return +true" onMouseOut="status='';" class="Nav" href="javascript: JSub('/IA +PD/Content/IAPDMain/iapd_SECRedirect.aspx?dir=IARD');">Return to SEC +IARD Page</a> </td> </tr> <tr> <td> <a onMouseOver="status='Link To Sitemap';return true" onMo +useOut="status='';" class="Nav" href="javascript: JSub('/IAPD/Content +/IAPDMain/iapd_SiteMap.aspx');">Return to Sitemap</a> </td> </tr> <tr> <td> <a onMouseOver="status='Link To Investment Adviser Search' +;return true" onMouseOut="status='';" class="NavSelected" href="javas +cript: JSub('/IAPD/Content/Search/iapd_OrgSearch.aspx');">Investment +Adviser Search</a> </td> </tr> </table> </td> </tr> </table> <BR> <BR> <!-- Left Nav Control Goes Here --> </td> </tr> </tbody> </table> </td> <td> <table border="0" cellpadding="0" cellspacing="0" +width="100%"> <tr> <td id="_ctl0_tdBackground" background="/I +APD/Images/iapd-wmk2.jpg"> <img alt="" height="10" src="/IAPD/Ima +ges/pixel.gif" width="540"><br /> <table border="0" cellPadding="0" cell +Spacing="0" width="100%"> <tbody> <tr> <td> <!-- Main Content --> <script language="Javascript"> var bDoneLoading = "N" </script> <table Border="0" cellspacing="0" cellpadding="0" height="100%" width= +"100%"> <tr valign="top" height="96%"> <td width="100%" colspan="3"> <DIV id="content"> <div class="PageTitle">Investment Adviser Search</div> <input name="Save" type="Hidden" value="IAPDSearch"> <table width="100%" border="0" cellpadding="0" cellspacing="0" +> <tr> <td class="QueryTableTitle" colspan="5">Search for an Inve +stment Adviser Firm:</td> </tr> <form name="Search1" method="Post" onSubmit="return JSubmitF +orm();"> <tr> <td colspan="5"> <table border="0" cellPadding="3" cellSpacing="0" widt +h="100%"> <tbody> <tr> <td class="QueryTableLabel" colspan="1" width="3 +0%">Firm Name:</td> <td class="QueryTableLabel" colspan="1"> <input type="text" name="Name" value="" size=" +30" maxlength="64"> <a HREF="JavaScript:JSubmitForm()" onMouseOver +="status='Perform Search'; return true" onMouseOut="status='';"> <img SRC="/IAPD/Images/go_off.gif" alt="Go" +name="go" onMouseover="JImgAct('go')" onMouseout="JImgInact('go')" BO +RDER="0" align="top"> </a> </td> </tr> <tr> <td class="QueryTableLabel" width="30%" valign=" +top">Type of Firm Search:</td> <td class="QueryTableLabel"> <input type="radio" name="SearchType" value="1 +" checked="true">Starts With<br><input type="radio" name="SearchType" + value="2">Contains<br><input type="radio" name="SearchType" value="3 +">Sounds Like</td> </tr> </tbody> </table> </td> </tr> </form> <tr> <td colSpan="5"> <hr> </td> </tr> <form name="Search2" method="Post" onSubmit="return JSubmitF +orm();"> <tr> <td colspan="5"> <table border="0" cellPadding="3" cellSpacing="0" widt +h="100%"> <tbody> <tr> <td width="30%" class="BoldTD" colspan="2">Firm +IARD/CRD Number:</td> <td colspan="3"> <input type="text" name="CrdNumber" value="" s +ize="16" maxlength="12"> <a HREF="JavaScript:JSubmitForm()" onMouseOver +="status='Perform Search'; return true" onMouseOut="status='';"> <img SRC="/IAPD/Images/go_off.gif" alt="Go" +name="go2" onMouseover="JImgAct('go2')" onMouseout="JImgInact('go2')" + BORDER="0" align="top"> </a> </td> </tr> </tbody> </table> </td> </tr> </form> <tr> <td colSpan="5"> <hr> </td> </tr> <form name="Search3" method="Post" onSubmit="return JSubmitF +orm();"> <tr> <td colspan="5"> <table border="0" cellPadding="3" cellSpacing="0" widt +h="100%"> <tbody> <tr> <td width="20%" class="BoldTD" colspan="1">Firm +SEC Number:</td> <td class="BoldRightTD" width="10%">801-</td> <td colspan="3"> <input type="text" name="SecNumber" value="" s +ize="16" maxlength="16"> <a HREF="JavaScript:JSubmitForm()" onMouseOver +="status='Perform Search'; return true" onMouseOut="status='';"> <img SRC="/IAPD/Images/go_off.gif" alt="Go" +name="go3" onMouseover="JImgAct('go3')" onMouseout="JImgInact('go3')" + BORDER="0" align="top"> </a> </td> </tr> </tbody> </table> </td> </tr> </form> <tr> <td colSpan="5"> <hr> </td> </tr> <form name="Search4"> <tr> <td colspan="5"> <table border="0" cellPadding="3" cellSpacing="0" widt +h="100%"> <tbody> <tr> <td width="30%" colspan="2" class="BoldTD">Recor +ds displayed per Page:</td> <td colspan="3"> <select name="NumRows"> <option value="25">25</option> <option value="50">50</option> <option value="75">75</option> <option value="100">100</option> </select> </td> </tr> </tbody> </table> </td> </tr> </form> </table> <br> <br> <br> <br> <form name="Content" method="Post"> <input type="hidden" name="LinkPage" value=""> <input type="hidden" name="PageType" value="Search"> <input type="hidden" name="ORG_PK" value="000"> <input type="hidden" name="STATE_CD" value=""> <input type="hidden" name="Name" value=""> <input type="hidden" name="SearchType" value=""> <input type="hidden" name="CrdNumber" value=""> <input type="hidden" name="SecNumber" value=""> <input type="hidden" name="NumRows" value=""> <input type="hidden" name="Save" value="IAPDSearch"> </form> </DIV> </td> </tr> </table> <script language="Javascript"> bDoneLoading = "Y" </script> </td> </tr> </tbody> </table> </td> </tr> </table> </td> </tr> <tr> <td colspan="4" class="FooterMenuItem noprint"> <!-- Navigation Footer --> <table border="0" cellpadding="0" cellspacing= +"0" width="100%"> <tr class="noprint" align="top"> <TD colspan="3" align="center" valign="top" width="150" Class="Bac +kToTop"> <a HREF="#PageTop" Class="FooterMenuItem" onMouseOver="status='S +croll Back to Top of this Page';return true" onMouseOut="status=''">B +ack to Top</a> </TD> <td Class="FooterMenuItem" align="Center"> <div align="center"> <b>IAPD Main</b> </div> <a class="FooterMenuItem" href="javascript: JSub('/IAPD/Content/ +IAPDMain/iapd_SECRedirect.aspx');" onMouseOver="status='Link To the S +EC'; return true" onMouseOut="status=''">Return to the SEC</a>  |  <a class="FooterMenuItem" href="javascript: JSub('/IAPD/Content +/IAPDMain/iapd_SECRedirect.aspx?dir=IARD');" onMouseOver="status='Lin +k To the SEC IARD Page'; return true" onMouseOut="status=''">Return t +o SEC IARD Page</a>  |  <a class="FooterMenuItem" href="javascript: JSub('/IAPD/Content +/IAPDMain/iapd_SiteMap.aspx');" onMouseOver="status='Link To Sitemap' +; return true" onMouseOut="status=''">Return to Sitemap</a>  |  <a class="FooterItemSelected" href="javascript: JSub('/IAPD/Con +tent/Search/iapd_OrgSearch.aspx');" onMouseOver="status='Link To Inve +stment Adviser Search'; return true" onMouseOut="status=''">Investmen +t Adviser Search</a>  </td> </tr> </table> </td> </tr> </table> </body> </html>

      What is it that I'm looking for? Thanks.

        Near the beginning of the file, there are six lines:

        <script type="text/javascript" language='Javascript' src='/IAPD/Inclu +des/Validation/Search/iapd_OrgSearch.js'></script> <script type="text/javascript" language='Javascript' src='/Iapd/ +Includes/iapd_WindowManagement.js'></script> <script type="text/javascript" language='Javascript' src='/Iapd/In +cludes/iapd_SetAndSub.js'></script> <script language=javascript>

        that tell the browser to load the corresponding scripts (eg /IAPD/Includes/Validation/Search/iapd_OrgSearch.js and the other two). If you follow the links (they are relative to the URL from which you got the HTML page) you can get the scripts.

        Then, in one the three scripts, you will find the javascript code which lays behind the page.

        Reading better the HTML I see a strange thing:

        <img SRC="/IAPD/Images/go_off.gif" alt="Go" name="go2" onMouseover="JImgAct('go2')" onMouseout="JImgInact('go2')" BORDER="0" align="top">

        is the definition of the object that you call "Button GO2", but it is not a button, it is a image and I see handlers only for onMouseover and onMouseut (called when the mous pass over the image and when it exits from the image). For this to act as a button, it should have a handler for onClick, too.

        Are you sure that this is the right object?

        Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.