in reply to Re: Re: Re: Re: Re: Re: Getting result page using WWW::Mechanize
in thread Getting result page using WWW::Mechanize
First of all, please take a look at the Writeup Formatting Tips. They tell you how to make your node look appealing and not look like an unstructured mess.
Second, I told you to try WWW::Mechanize::Shell, a tool to generate scripts for WWW::Mechanize. Did you try it? If so, where were the problems you encountered?
If you did try it, but did not get far, then my solution below will not be of much help, since you will have to learn about JavaScript and HTML before you can automate this website.
Using WWW::Mechanize::Shell, I went to that website and navigated through the pages, emulating the JavaScript by hand. Here is the full transcript of that session, with comments :
# Fake IE ua "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)" get http://www.attachmail.com/ # Navigate to login frame open bott # Login value UserId test01 value Passwd s click imageField # Got empty page. Why? content # Aaah - it is a META refresh page # Extract the target url : eval $self->agent->content =~ /URL=(.*)'>/; qq{http://www.attachmail.c +om/cg-bin/$1} # Jump to that page get http://www.attachmail.com/cg-bin/userpagedisplay.cgi?user_no=10573 +48162&domain=attachmail.com&dtexpiry=19-Jul-2003&status=0&spacebal=4. +998&bounce=&flmove= # Multi-frame page, go to the content frame open mailatt open /AddressBook/ open /AddressBook/ # Still no action - must be JavaScript. # Looking at the JavaScript, I know now that I need the # user number : eval $self->agent->uri eval $self->agent->uri=~/userno=(\d+)/; $1 # Go to the page referenced from the JavaScript get http://www.attachmail.com//cg-bin/am_bring_Addbook.cgi?userno=1057 +348162&flg=Disp # Find the correct frame : open mainFrame back open bottomFrame # More JavaScript interaction: get http://www.attachmail.com/cg-bin/add_entry.cgi?username=1057348162 # Fill in the values and submit the form value nam foo value email bar@example.com submit
Then I looked at the code that was generated by my actions, and cleaned it up with the stuff I know about this website :
#!D:\Programme\indigoperl-5.6\bin\perl.exe -w use strict; use WWW::Mechanize; use URI::URL; my $agent = WWW::Mechanize->new(); $agent->env_proxy(); $agent->get('http://www.attachmail.com/'); $agent->form(1) if $agent->forms and scalar @{$agent->forms}; $agent->agent('Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)'); # navigate to the login frame $agent->follow('bott'); { local $^W; $agent->current_form->value('UserId', 'test01'); }; { local $^W; $agent->current_form->value('Passwd', 's'); }; $agent->click('imageField'); # The login page is a redirect die "Unknown page received" unless $agent->content =~ /URL=(.*)'>/; my $redirect = qq{http://www.attachmail.com/cg-bin/$1}; $agent->get($redirect); $agent->form(1) if $agent->forms and scalar @{$agent->forms}; $agent->follow('mailatt'); die "Couldn't retrieve user number" unless $agent->uri=~/userno=(\d+)/; my $userno = $1; $agent->get("http://www.attachmail.com//cg-bin/am_bring_Addbook.cgi? +userno=$userno&flg=Disp'); $agent->form(1) if $agent->forms and scalar @{$agent->forms}; $agent->follow('bottomFrame'); $agent->get("http://www.attachmail.com/cg-bin/add_entry.cgi?username +=$userno"); $agent->form(1) if $agent->forms and scalar @{$agent->forms}; { local $^W; $agent->current_form->value('nam', 'foo'); }; { local $^W; $agent->current_form->value('email', 'bar@example.com') +; }; $agent->submit();
This is the complete code and should work as is, but it will not help you much unless you actually try to understand the website and how this code interacts with it.
perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Re: Re: Re: Re: Getting result page using WWW::Mechanize
by Anonymous Monk on Jul 05, 2003 at 18:28 UTC | |
|
Re: Re: Re: Re: Re: Re: Re: Re: Getting result page using WWW::Mechanize
by Anonymous Monk on Jul 07, 2003 at 12:15 UTC | |
by Corion (Patriarch) on Jul 07, 2003 at 12:19 UTC |