JavaScript and https page and contents

ShayShay has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I've been searching and trying to figure this out for a few days and have come to the conclusion that I don't know what the heck I'm doing. Can a girl get a little help? I need to access the "Students" > "Search for Sections" page of this website: https://admin8.gtc.edu/wa/wa I'm just on the first step... returning the contents of the page. I'll worry about following links after I've got this step done. I'm getting the error "Javascript is currently disabled." I thought JavaScript::SpiderMonkey would take care of that... but I guess I'm doing something wrong. Help?

#!/usr/bin/perl
######################################################################
+##########
#these modules must be installed
######################################################################
+##########
use JavaScript::SpiderMonkey;
use LWP::UserAgent;
######################################################################
+##########
#HTML headers included to make page show in browser
######################################################################
+##########
print "HTTP/1.0 200 OK\n";
print "Content-Type: text/html\n\n\n";
######################################################################
+##########
#Enable javascript engine
######################################################################
+##########
my $js = JavaScript::SpiderMonkey->new();
$js->init();  # Initialize Runtime/Context
#Define a perl callback for a new JavaScript function
$js->function_set("print_to_perl", sub { print "@_\n"; });
# Create a new (nested) object and a property
$js->property_by_path("document.location.href");
# Execute some code
my $rc = $js->eval(q!
    document.location.href = append("https://", "admin8.gtc.edu/wa/wa?
+&TYPE=M&PID=CORE-WBMAIN&TOKENIDX=3292319802");
        print_to_perl("URL is ", document.location.href);
        function append(first, second) {
             return first + second;
        }
!);
# Get the value of a property set in JS
my $url = $js->property_get("document.location.href");
######################################################################
+##########
#Get page contents
######################################################################
+##########
require HTTP::Request;
my $req = new HTTP::Request('GET', $url);
my $ua = new LWP::UserAgent;
my $res = $ua->request($req);
print $res->code."\n"; 
print "\n\n";
print $res->content;
######################################################################
+##########
#Cleanup
######################################################################
+##########
$js->destroy();
[download]

Comment on JavaScript and https page and contents Download Code

Replies are listed 'Best First'.
Re: JavaScript and https page and contents by pc88mxer (Vicar) on Mar 26, 2008 at 15:58 UTC
As far as I can tell, `Javascript::SpiderMonkey` is simply a Javascript interpreter, but it is not integrated with `WWW::Mechanize` to execute the javascript that is found in HTML pages. It does not create a DOM (document object model), and unlike in a real browser, setting variables like `document.location.href` doesn't actually do anything (except set that variable.) In your case, the page you get from the above url contains this `<script>` tag: `<script language="Javascript" src="./javascript/WebAdvisor_scripts.js" +></script>` [download] That `.js` file has the code which creates the user interface you are seeing when you get the page with a browser.	[reply] [d/l] [select]
Re: JavaScript and https page and contents by Anonymous Monk on Mar 26, 2008 at 16:59 UTC
Bypass Javascript. Use Firefox + Firebug or IE + Wireshark to see what gets POSTed after JS is done with processing. Then just note this down and fill the forms with WWW::Mechanize. The webserver doesn't know the difference.	[reply]