tonyday has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I am trying to learn about all things webby, in particular LWP::UserAgent and HTML::Parser. I need help though as I find that my attempts at meaningful functionality are being frustrated by Javascript. Before I explain my problem, I would like to state that I am starting with zero net knowledge eg I have never used cgi, coded javascript or designed a web page say. Ok, so let me explain what's happening with an example...

The Project

As a learning project, I decided to try and write some code that would perform the following actions:

1. log on to my net banking site,

2. locate my account balances and recent transactions,

3. reconcile these with my budget and personal records, and finally

4. send a chatty email to my wife based on the results thus proving to her that my interest in perl can lead to practical benefits.

Initial Problem

I haven't been able to get past step 1. Firstly, I learnt about lwp and html via the pods and some basic code to be able to get pages and extract html. The future seemed bright and clear. Then I tried to apply this knowledge to the project. The first thing I noticed was that the start page was javascript. No matter, I'll just scan the source and extract what I need.

I then plugged the refernced url into my code and got the following...

META HTTP-EQUIV="Pragma" CONTENT="no-cache" META HTTP-EQUIV="REFRESH" REFRESH: 0; CONTENT="0; URL=https://www.xxxb +ank.com//main.htm"

Looking good... so after finding out about what https stands for I was ready. Alas, all I got for my trouble was a 500: Unknown Error error. Here is my modest code pretty much equal to the pod example.

my $start_page = 'https://www.xxxbank.com//main.htm'; use LWP::UserAgent; use HTTP::Cookies; $ua = LWP::UserAgent->new; $ua->agent("Mozilla/8.0"); # pretend we are very capable browser $ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt", autosave => 1)); $req = HTTP::Request->new(GET => $start_page); $res = $ua->request($req); print "Error: " . $res->status_line . "\n" unless $res->is_success; print $res->content;

The Question

I'm not sure where to go now. I have tried going to the next stage of the login where I found form data that I put in a POST request but it led to the same error. My feel is that pages with lots of Javascript are inaccessible to a humble perl hacker and will end in tears. Is the wide world out there a Java one or can someone counsel perl perseverence and eventual enlightenment. I'm happy to do the hard yards of understanding html, cgi and so on but I get a chill in my spine thinking about hacking through javascript.

Replies are listed 'Best First'.
(crazyinsomniac: strategy) Re: LWP and Javascript
by crazyinsomniac (Prior) on Feb 07, 2002 at 08:45 UTC
    All is not lost, and the first thing I'd ask of your, is your bank account and password, so that I may help you out ;D (ok, just kidding), but seriously, as regular user of this site, you have to know if javascript is mandatory for access, and if so, weel, then you need to start dumping everything to disk (which should be your strategy anyway), cause what you are attempting is by no means impossible, nor horribly complex, but there is just so much http/html/javascript you have to weed through to figure out what to do, that it might not be worth it.

    So before you go any futher, dump as much of the site that your browser usually gets to disk. You can use LWP to attempt to do it, or simply employ some sophisticated network sniffer, to basically capture a session where you log in/ and log out (for startes), and then decipher how to do it using LWP.

    Until you do this, nobody can even begin to help you (all the advice will probably be good, but will usually be as vague as this, and mostly involve technique ... and this is a tried and true way).

    and on a final note, my <shameless plug> HTML::TokeParser Tutorial </shameless plug> - may help you in parsering that html (there is probably something somethingjavascript for dealing with that).

    Happy coding!

     
    ______crazyinsomniac_____________________________
    Of all the things I've lost, I miss my mind the most.
    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Re: LWP and Javascript
by gav^ (Curate) on Feb 07, 2002 at 05:31 UTC
    Have you installed Crypt::SSLeay which will enable you to make HTTPS requests?

    gav^

      Yes, I installed Crypt::SSLeay and have verified that I can get https requests from other sites.
Re: LWP and Javascript
by dws (Chancellor) on Feb 07, 2002 at 06:37 UTC
    Are you double extra certain that any cookies you received from the http: request are getting passed back via the https: request? I see that you're hooking up the cookie jar, but you don't show code that populates it.

    You might need to forge a Referer string. Some sites like to check that what's on the client end at least pretends to be a browser.

      You might need to fake the browser too, at my banking site I can more or less only access with Internet explorer, but if I tell Opera to pretend to be "Internet Explorer", it all works well. I see you do this:

      $ua->agent("Mozilla/8.0"); # pretend we are very capable browser

      This might very well be overdoing it, by pretending to be this "very capable browser", you are also pretending to be a non-existing browser, at least as far as the site is concerned. Lots of poorly written controls, both in javascript and in server-side code doesn't really check if your browser can handle what it needs to, only if it is "the most common one(s)". This is of course, the easy way out for lazy (false laziness) programmers, and it will almost always come back to haunt them or whoever gets to maintain the code.

      I'd suggest that you copy the User-Agent header verbatim from a browser that you know works on that site, such as (most likely) Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) - even if you rather want to be the meanest browser on the block :)

Re: LWP and Javascript
by screamingeagle (Curate) on Feb 07, 2002 at 06:33 UTC
    you mention Javascript twice in your post and then, towards the end of your post, you talk about Java... I'm not sure which language you have in mind...it's important to realize that they have almost nothing in common (except the first 4 letters of their name :) )
    I'm going to assume that it was Javascript you had in mind...well, i took the block of code you posted and ran it against a secure URL , which also contained a lot of Javascript... and the script ran just fine...
    it's highly unlikely that Javascript is the culprit here (since it's a client-side scripting language and not a server-side and does not get executed when you get the source code using LWP)
    it might be that the website you're posting to was down at that time...