Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Handling Javascript with LWP::UserAgent

by mrguy123 (Hermit)
on Jul 09, 2006 at 14:55 UTC ( [id://559996]=perlquestion: print w/replies, xml ) Need Help??

mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I'm trying to retrieve this web page
http://www.GIDEONonline.net/loginx.php?user=metalib
with the LWP::UserAgent. The code goes like this:
#!/usr/bin/perl use strict; use LWP::UserAgent; { my $ua = new LWP::UserAgent(); my $search_address = "http://www.GIDEONonline.net/loginx.php?user= +metalib"; #creating the request object my $req = new HTTP::Request ('POST', $search_address); #sending the request my $res = $ua->request($req); if (!($res->is_success)){ warn "Warning:".$res->message."\n"; } my $response = $res->headers_as_string(); my $response .= $res->content(); print "$response\n"; }
The response page is as so:
<HTML> <HEAD> <SCRIPT LANGUAGE="JavaScript" SRC="js_lib/browser_check.js"></SCRIPT> <SCRIPT LANGUAGE="JavaScript"> function goThere() { if ( (is_nav && (is_major >= 6)) || (is_ie && (is_majo +r >=5)) ) { document.location.href = "/authx.php?user=meta +lib&browser_ok=" + 1; } else { document.location.href = "/authx.php?user=meta +lib&browser_ok=" + 0; } } </SCRIPT> </HEAD> <NOSCRIPT> <BR> <CENTER><B>Javascript must be turned on to access Gideon Onlin +e.</B></CENTER> </NOSCRIPT> <BODY BGCOLOR="#FFFFFF" onLoad="goThere()"> </BODY> </HTML>
As far as I know, Javascript is not turned off (I retrieve similar pages that use javascript).
The page that I should recieve is Gideon Online's homepage. Because I have IP authentication I might expect problems with authentication on the server (although I have not experienced such problems before), but that doesn't explain the javascript being turned off. Does anybody have any ideas to what might be the reason this isn't working?
Thanks,
Guy Naamati

2006-07-09 Retitled by Corion, as per Monastery guidelines
Original title: 'javascript'

Replies are listed 'Best First'.
Re: Handling Javascript with LWP::UserAgent
by Ieronim (Friar) on Jul 09, 2006 at 16:19 UTC
    The simplest solution is to go directly to
    http://www.GIDEONonline.net/authx.php?user=metalib&browser_ok=1
    instead of
    http://www.GIDEONonline.net/loginx.php?user=metalib
    The Javascript on the page you retrieved is simply redirecting you to this location. And LWP::UserAgent does not understand this, as it has no embedded Javascript engine :)

    Generally, if you want to process pages containing JavaScript with LWP::UserAgent, you must carefully read all javascript on the pages you retrieve and decide what workaround to choose in every certain case.

    In the worst cases Javascript module is the only solution.

      Thanks, it worked like a charm. I didn't exactly get the retrieved page, but I did get a cookie I needed for the search page. Problem solved.
Re: Handling Javascript with LWP::UserAgent
by Corion (Patriarch) on Jul 09, 2006 at 15:04 UTC

    The problem is simply that LWP::UserAgent does not know about and does not handle Javascript. You need to filter out the interesting parts of the Javascript yourself and react accordingly.

      Hi. This is an example of a page I retrieved that uses JavaScript. The code is as so:
      #!/usr/bin/perl use strict; use LWP::UserAgent; { my $ua = new LWP::UserAgent(); my $search_address = "http://online.wsj.com/search/full.html?"; #creating the request object my $req = new HTTP::Request ('GET', $search_address); #sending the request my $res = $ua->request($req); if (!($res->is_success)){ warn "Warning:".$res->message."\n"; } my $response = $res->headers_as_string(); my $response .= $res->content(); print "$response\n"; }
      If you run this code you should get a response that has Javascripts. As you can see the code is basically the same except for the URL.
        The JavaScript on the page can do many different things. In some cases javascripting can be ignored, in some cases cannot.

        In general, if you can work with the page with Javascript turned off in your favorite browser without any loss of essential functionality — you can easily work with the page using LWP::UserAgent.

      I have been able to retrieve pages with Javascript in the past. What's different now?

        Since you do not tell us what those past pages looked like (in terms of Javascript), how could we possibly tell you what has changed? We can guess -- perhaps those pages you speak of did not use Javascript for navigational purposes. You could try the Javascript CPAN module. I used this successfully several years ago to decrypt an encrypted web page, but something tells me you might need to parse out the href locations yourself and feed those back to LWP.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: Handling Javascript with LWP::UserAgent
by bart (Canon) on Jul 09, 2006 at 15:12 UTC
Re: Handling Javascript with LWP::UserAgent
by shmem (Chancellor) on Jul 09, 2006 at 23:14 UTC
    Just to show how's it can be done...

    Some time ago I installed JavaScript::SpiderMonkey, but had not played with it so far - so your question is an opportunity ;-)

    As others pointed out LWP::UserAgent doesn't parse or evaluate JavaSript. The code below uses JavaScript::SpiderMonkey for that and extracts the JavaScript stuff with HTML::Parser.

    This cruft for obvious reasons works only for the link you provided.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://559996]
Approved by Corion
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-03-28 22:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found