ShayShay has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I've been searching and trying to figure this out for a few days and have come to the conclusion that I don't know what the heck I'm doing. Can a girl get a little help? I need to access the "Students" > "Search for Sections" page of this website: https://admin8.gtc.edu/wa/wa I'm just on the first step... returning the contents of the page. I'll worry about following links after I've got this step done. I'm getting the error "Javascript is currently disabled." I thought JavaScript::SpiderMonkey would take care of that... but I guess I'm doing something wrong. Help?
#!/usr/bin/perl ###################################################################### +########## #these modules must be installed ###################################################################### +########## use JavaScript::SpiderMonkey; use LWP::UserAgent; ###################################################################### +########## #HTML headers included to make page show in browser ###################################################################### +########## print "HTTP/1.0 200 OK\n"; print "Content-Type: text/html\n\n\n"; ###################################################################### +########## #Enable javascript engine ###################################################################### +########## my $js = JavaScript::SpiderMonkey->new(); $js->init(); # Initialize Runtime/Context #Define a perl callback for a new JavaScript function $js->function_set("print_to_perl", sub { print "@_\n"; }); # Create a new (nested) object and a property $js->property_by_path("document.location.href"); # Execute some code my $rc = $js->eval(q! document.location.href = append("https://", "admin8.gtc.edu/wa/wa? +&TYPE=M&PID=CORE-WBMAIN&TOKENIDX=3292319802"); print_to_perl("URL is ", document.location.href); function append(first, second) { return first + second; } !); # Get the value of a property set in JS my $url = $js->property_get("document.location.href"); ###################################################################### +########## #Get page contents ###################################################################### +########## require HTTP::Request; my $req = new HTTP::Request('GET', $url); my $ua = new LWP::UserAgent; my $res = $ua->request($req); print $res->code."\n"; print "\n\n"; print $res->content; ###################################################################### +########## #Cleanup ###################################################################### +########## $js->destroy();

Replies are listed 'Best First'.
Re: JavaScript and https page and contents
by pc88mxer (Vicar) on Mar 26, 2008 at 15:58 UTC
    As far as I can tell, Javascript::SpiderMonkey is simply a Javascript interpreter, but it is not integrated with WWW::Mechanize to execute the javascript that is found in HTML pages. It does not create a DOM (document object model), and unlike in a real browser, setting variables like document.location.href doesn't actually do anything (except set that variable.)

    In your case, the page you get from the above url contains this <script> tag:

    <script language="Javascript" src="./javascript/WebAdvisor_scripts.js" +></script>
    That .js file has the code which creates the user interface you are seeing when you get the page with a browser.
Re: JavaScript and https page and contents
by Anonymous Monk on Mar 26, 2008 at 16:59 UTC
    Bypass Javascript. Use Firefox + Firebug or IE + Wireshark to see what gets POSTed after JS is done with processing. Then just note this down and fill the forms with WWW::Mechanize. The webserver doesn't know the difference.