initself has asked for the wisdom of the Perl Monks concerning the following question:
This is a continuation of HTML::Form Submit Issue. My attempt now is, after the first form submission (see first code chunk), to go to the last page of results, again using a form submit.
Update: I have reason to believe that my cookies implimentation is fundamentally flawed to begin with. Once I start diving in more than one page deep, my cookie jar doesn't follow me. I hope to find a solution forthright.
Update 2: I am now sending cookies with my header.
$filled_out_request = $forms[1]->click; $cookie_jar->add_cookie_header($filled_out_request);
The cookie does not seem to change between the first form submit and the second form submit in my code. Can anyone verify whether this is probably correct or not? This much we learned: even though we are now sending headers with the request, the server still doesn't like what we're sending. See reply for complete code.
Update 3 (Final): WWW::Mechanize really is easier than doing all the grunt work yourself! I learned a lot trying to do it manually, but WWW::Mechanize handles all the cookies and JavaScript mess transparently. That way I can populate my form, get my info, and be on my merry way. I'd still like to know why my 'manual' solution didn't work so if you have time, take a look!
Thanks, tphyahoo for spawning me to simplify and 'uri' who was also a big help.
Here's the code:
#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = 'http://browseusers.myspace.com/Browse/Browse.aspx'; $mech->get($url); my $form = $mech->form_name("frmBrowse"); $mech->set_fields( zipRadius => 5, zipCode => 92630, Page => 75, ); my $response = $mech->submit(); print $response->content;
This is the code I have so far:
#!/usr/bin/perl use strict; use warnings; use CGI ':standard'; use LWP::UserAgent; use HTML::Form; use HTML::LinkExtor; my $browser = LWP::UserAgent->new; my $browse_url = 'http://browseusers.myspace.com/Browse/Browse.aspx'; my $response = $browser->get($browse_url); my @forms = HTML::Form->parse($response); # Pull ACTION out of JavaScript function my $content = $response->content; $content =~ m{document\.frmBrowse\.action = "(.*?)"}; my $action_url = "http://browseusers.myspace.com/Browse/" . "$1"; $forms[1]->action($action_url); my $action = $forms[1]->action; # Get Form Elements my $zipRadius = $forms[1]->find_input("zipRadius", "option"); my $zipCode = $forms[1]->find_input("zipCode", "text"); my $minAge = $forms[1]->find_input("minAge", "option"); my $maxAge = $forms[1]->find_input("maxAge", "option"); my $showHasPhotoOnly = $forms[1]->find_input("showHasPhotoOnly", "che +ckbox"); my $showNamePhotoOnly = $forms[1]->find_input("showNamePhotoOnly", "ch +eckbox"); # Get Hidden Values my $update = $forms[1]->find_input("update", "submit"); my $__EVENTTARGET = $forms[1]->find_input("__EVENTTARGET"); my $Page = $forms[1]->find_input("Page"); # Assign Values $zipRadius->value("Any"); $zipCode->value(""); $minAge->value("18"); $maxAge->value("100"); $showHasPhotoOnly->value("on"); $showNamePhotoOnly->value("on"); # Assign Hidden Values $update->value(""); $__EVENTTARGET->value("update"); $Page->value("1"); # Update Form my $filled_out_request = $forms[1]->click; # print $filled_out_request->as_string; $response = $browser->request($filled_out_request); # Parse Content For Links my $p = HTML::LinkExtor->new; $p->parse($response->content); my @links = $p->links; push my @urls, map {$_->[2]} @links; # Remove Duplicates my %saw; @saw{@urls} = (); my @unique_urls = sort keys %saw; # Parse Urls my @pages; foreach my $sorted_url (@unique_urls) { # Friend Urls if ($sorted_url =~ m{http://profile.myspace.com/index.cfm\?fuseactio +n=user.viewProfile&friendID=(.*?)&}) { my $friend_id = $1; #print "$friend_id\n"; } # Page Urls if ($sorted_url =~ m{javascript:GotoPage\((.*?)\)\;}) { my $page = $1; push(@pages, $page); } } # Get Last Page my $lastpage = pop @pages; #print "$lastpage\n";
Everything up to here has been tested and works great. My initial form submission was a success. But now I need to submit the form again, this time to get the results of $lastpage. My Live HTTP Headers look great. $filled_out_request looks like what I would expect. Still, no page turn. It stays on page 1.
# Attempt to emulate GotoPage Javascript: #function GotoPage(page) { # document.frmBrowse.Page.value = page; # document.frmBrowse.action = "Browse.aspx?MyToken=6326971945227880 +21"; # document.frmBrowse.submit(); # return true; #} # Parse Content On Current Page To Get New Token $content = $response->content; $content =~ m{document\.frmBrowse\.action = "(.*?)"}; $action_url = "http://browseusers.myspace.com/Browse/" . "$1"; $forms[1]->action($action_url); $action = $forms[1]->action; # Set New Hidden Values $__EVENTTARGET->value("update"); $Page->value("$lastpage"); # Submit New Form $filled_out_request = $forms[1]->click; # print $filled_out_request->as_string; $response = $browser->request($filled_out_request); # Print Content $content = $response->content; print $content;
Discussion of headers below...
Here are the headers I am passing through for reference. The first is what Firefox looks like when it clicks on 'Page 75'. The second is what my code looks like doing the same.
Firefox:My Code:POST /browse/Browse.aspx?MyToken=632697383226956664 HTTP/1.1 __EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=&Page=75&Gender=genderWome +n&minAge=18&maxAge=100&country=US&zipRadius=Any&zipCode=&showHasPhoto +Only=on&showNamePhotoOnly=on&SortBy=sortByLastLogin
POST http://browseusers.myspace.com/Browse/Browse.aspx?MyToken=6326973 +83003013250 __EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=&Page=75&Gender=genderWome +n&minAge=18&maxAge=100&country=US&zipRadius=Any&zipCode=&showHasPhoto +Only=on&showNamePhotoOnly=on&SortBy=sortByLastLogin&update=
Tokens are not going to match, but I am sure I am getting the right tokens in my code.
Anyone see what I might be missing?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Form Submit Issue - Part II
by initself (Monk) on Dec 09, 2005 at 23:24 UTC | |
|
Re: HTML::Form Submit Issue - Part II
by tphyahoo (Vicar) on Dec 10, 2005 at 07:14 UTC |