This is a continuation of HTML::Form Submit Issue. My attempt now is, after the first form submission (see first code chunk), to go to the last page of results, again using a form submit.


Update: I have reason to believe that my cookies implimentation is fundamentally flawed to begin with. Once I start diving in more than one page deep, my cookie jar doesn't follow me. I hope to find a solution forthright.

Update 2: I am now sending cookies with my header.

$filled_out_request = $forms[1]->click; $cookie_jar->add_cookie_header($filled_out_request);

The cookie does not seem to change between the first form submit and the second form submit in my code. Can anyone verify whether this is probably correct or not? This much we learned: even though we are now sending headers with the request, the server still doesn't like what we're sending. See reply for complete code.

Update 3 (Final): WWW::Mechanize really is easier than doing all the grunt work yourself! I learned a lot trying to do it manually, but WWW::Mechanize handles all the cookies and JavaScript mess transparently. That way I can populate my form, get my info, and be on my merry way. I'd still like to know why my 'manual' solution didn't work so if you have time, take a look!

Thanks, tphyahoo for spawning me to simplify and 'uri' who was also a big help.

Here's the code:

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = 'http://browseusers.myspace.com/Browse/Browse.aspx'; $mech->get($url); my $form = $mech->form_name("frmBrowse"); $mech->set_fields( zipRadius => 5, zipCode => 92630, Page => 75, ); my $response = $mech->submit(); print $response->content;

This is the code I have so far:

#!/usr/bin/perl use strict; use warnings; use CGI ':standard'; use LWP::UserAgent; use HTML::Form; use HTML::LinkExtor; my $browser = LWP::UserAgent->new; my $browse_url = 'http://browseusers.myspace.com/Browse/Browse.aspx'; my $response = $browser->get($browse_url); my @forms = HTML::Form->parse($response); # Pull ACTION out of JavaScript function my $content = $response->content; $content =~ m{document\.frmBrowse\.action = "(.*?)"}; my $action_url = "http://browseusers.myspace.com/Browse/" . "$1"; $forms[1]->action($action_url); my $action = $forms[1]->action; # Get Form Elements my $zipRadius = $forms[1]->find_input("zipRadius", "option"); my $zipCode = $forms[1]->find_input("zipCode", "text"); my $minAge = $forms[1]->find_input("minAge", "option"); my $maxAge = $forms[1]->find_input("maxAge", "option"); my $showHasPhotoOnly = $forms[1]->find_input("showHasPhotoOnly", "che +ckbox"); my $showNamePhotoOnly = $forms[1]->find_input("showNamePhotoOnly", "ch +eckbox"); # Get Hidden Values my $update = $forms[1]->find_input("update", "submit"); my $__EVENTTARGET = $forms[1]->find_input("__EVENTTARGET"); my $Page = $forms[1]->find_input("Page"); # Assign Values $zipRadius->value("Any"); $zipCode->value(""); $minAge->value("18"); $maxAge->value("100"); $showHasPhotoOnly->value("on"); $showNamePhotoOnly->value("on"); # Assign Hidden Values $update->value(""); $__EVENTTARGET->value("update"); $Page->value("1"); # Update Form my $filled_out_request = $forms[1]->click; # print $filled_out_request->as_string; $response = $browser->request($filled_out_request); # Parse Content For Links my $p = HTML::LinkExtor->new; $p->parse($response->content); my @links = $p->links; push my @urls, map {$_->[2]} @links; # Remove Duplicates my %saw; @saw{@urls} = (); my @unique_urls = sort keys %saw; # Parse Urls my @pages; foreach my $sorted_url (@unique_urls) { # Friend Urls if ($sorted_url =~ m{http://profile.myspace.com/index.cfm\?fuseactio +n=user.viewProfile&friendID=(.*?)&}) { my $friend_id = $1; #print "$friend_id\n"; } # Page Urls if ($sorted_url =~ m{javascript:GotoPage\((.*?)\)\;}) { my $page = $1; push(@pages, $page); } } # Get Last Page my $lastpage = pop @pages; #print "$lastpage\n";

Everything up to here has been tested and works great. My initial form submission was a success. But now I need to submit the form again, this time to get the results of $lastpage. My Live HTTP Headers look great. $filled_out_request looks like what I would expect. Still, no page turn. It stays on page 1.

# Attempt to emulate GotoPage Javascript: #function GotoPage(page) { # document.frmBrowse.Page.value = page; # document.frmBrowse.action = "Browse.aspx?MyToken=6326971945227880 +21"; # document.frmBrowse.submit(); # return true; #} # Parse Content On Current Page To Get New Token $content = $response->content; $content =~ m{document\.frmBrowse\.action = "(.*?)"}; $action_url = "http://browseusers.myspace.com/Browse/" . "$1"; $forms[1]->action($action_url); $action = $forms[1]->action; # Set New Hidden Values $__EVENTTARGET->value("update"); $Page->value("$lastpage"); # Submit New Form $filled_out_request = $forms[1]->click; # print $filled_out_request->as_string; $response = $browser->request($filled_out_request); # Print Content $content = $response->content; print $content;

Discussion of headers below...

Here are the headers I am passing through for reference. The first is what Firefox looks like when it clicks on 'Page 75'. The second is what my code looks like doing the same.

Firefox:
POST /browse/Browse.aspx?MyToken=632697383226956664 HTTP/1.1 __EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=&Page=75&Gender=genderWome +n&minAge=18&maxAge=100&country=US&zipRadius=Any&zipCode=&showHasPhoto +Only=on&showNamePhotoOnly=on&SortBy=sortByLastLogin
My Code:
POST http://browseusers.myspace.com/Browse/Browse.aspx?MyToken=6326973 +83003013250 __EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=&Page=75&Gender=genderWome +n&minAge=18&maxAge=100&country=US&zipRadius=Any&zipCode=&showHasPhoto +Only=on&showNamePhotoOnly=on&SortBy=sortByLastLogin&update=

Tokens are not going to match, but I am sure I am getting the right tokens in my code.

Anyone see what I might be missing?


In reply to HTML::Form Submit Issue - Part II by initself

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.