Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I'm writing a utility for my self that works with the WWW-Myspace module and I've run into the following issue using WWW-Mechanize:

When the script encounters a CAPTCHA code the module returns the URL so you can retrieve the image and enter the code to continue. After several hours of tweaking my script I decided to use Ethereal to capture the packets from both a manual submission and my script. See below:

# MANUAL SUBMISSION Hypertext Transfer Protocol GET /CAPTCHA/CAPTCHA.aspx?SecurityToken=8C75EEB0D4964AE99A9786283B +AE95BA HTTP/1.1\r\n Request Method: GET Request URI: /CAPTCHA/CAPTCHA.aspx?SecurityToken=8C75EEB0D4964 +AE99A9786283BAE95BA Request Version: HTTP/1.1 Accept: */*\r\n Referer: http://collect.myspace.com/index.cfm?fuseaction=invite.ad +dfriend_verify&friendID=107828070\r\n Accept-Language: en-us\r\n Accept-Encoding: gzip, deflate\r\n User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1 +; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n Host: security.myspace.com\r\n Connection: Keep-Alive\r\n Cookie: NGUserID=a28255d-528-1162008931-2; AUTOSONGPLAY=0; MYSPACE +=myspace; DERDB=ZG9tYWluPW1ldHJvY2FzdCZ0bGQ9bmV0JnNtb2tlcj0wJnNleHByZ +WY9MSZ1dHlwZT0yJnJlbGlnaW9uaWQ9MCZyZWdpb249JnBvc3RhbGNvZGU9Mzg2OCZtYX +JpdGFsc3RhdHVzPU0maW5jb21laWQ9MCZoZ \r\n Hypertext Transfer Protocol POST /index.cfm?fuseaction=invite.addFriendsProcess&Mytoken=D40C54 +80-735E-4478-87F5593FAE9DF7CB7111271 HTTP/1.1\r\n Request Method: POST Request URI: /index.cfm?fuseaction=invite.addFriendsProcess&My +token=D40C5480-735E-4478-87F5593FAE9DF7CB7111271 Request Version: HTTP/1.1 Accept: */*\r\n Referer: http://collect.myspace.com/index.cfm?fuseaction=invite.ad +dfriend_verify&friendID=107828070\r\n Accept-Language: en-us\r\n Content-Type: application/x-www-form-urlencoded\r\n Accept-Encoding: gzip, deflate\r\n User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1 +; .NET CLR 1.1.4322; .NET CLR 2.0.50727)\r\n Host: collect.myspace.com\r\n Content-Length: 255\r\n Connection: Keep-Alive\r\n Cache-Control: no-cache\r\n Cookie: NGUserID=a28255d-528-1162008931-2; AUTOSONGPLAY=0; MYSPACE +=myspace; DERDB=ZG9tYWluPW1ldHJvY2FzdCZ0bGQ9bmV0JnNtb2tlcj0wJnNleHByZ +WY9MSZ1dHlwZT0yJnJlbGlnaW9uaWQ9MCZyZWdpb249JnBvc3RhbGNvZGU9Mzg2OCZtYX +JpdGFsc3RhdHVzPU0maW5jb21laWQ9MCZoZ \r\n Line-based text data: application/x-www-form-urlencoded hashcode=MIGLBgorBgEEAYI3WAOkoH0wewYKKwYBBAGCN1gDAaBtMGsCAwIAAQICZ +gMCAgDABAiDFtagrDKBJQQQAZ1eoDG30LD3hChb1lyNxwRAYC67osSYAgTw2Azeh2qaiI +5qT%2BxhhFRaGMIEnmUNMa1p8cLj%2BS6AZrPW4znnqoecL5fvN3NKkpPP1tB%2BpmtKe +g%3D%3D&friendID=107828070&CAPTCHAR # SCRIPT SUBMISSION Hypertext Transfer Protocol GET /CAPTCHA/CAPTCHA.aspx?SecurityToken=3E4F1A84164B4DE2BFEAE76376 +0BEB32 HTTP/1.0\r\n Request Method: GET Request URI: /CAPTCHA/CAPTCHA.aspx?SecurityToken=3E4F1A84164B4 +DE2BFEAE763760BEB32 Request Version: HTTP/1.0 Host: security.myspace.com\r\n User-Agent: lwp-trivial/1.41\r\n \r\n Hypertext Transfer Protocol POST /index.cfm?fuseaction=invite.addFriendsProcess&Mytoken=88136E +67-DA9F-4FE6-8EDD5377C3F58AE64165334 HTTP/1.1\r\n Request Method: POST Request URI: /index.cfm?fuseaction=invite.addFriendsProcess&My +token=88136E67-DA9F-4FE6-8EDD5377C3F58AE64165334 Request Version: HTTP/1.1 TE: deflate,gzip;q=0.3\r\n Connection: TE, close\r\n Accept-Encoding: identity\r\n Host: collect.myspace.com\r\n Referer: http://collect.myspace.com/index.cfm?fuseaction=invite.ad +dfriend_verify&friendID=107828070\r\n User-Agent: WWW-Mechanize/1.20\r\n Content-Length: 252\r\n Content-Type: application/x-www-form-urlencoded\r\n Cookie: RBLOCKCNT=0; MSCOUNTRY=US; MYUSERINFO=MIICNQYKKwYBBAGCN1gD +pKCCAiUwggIhBgorBgEEAYI3WAMBoIICETCCAg0CAwIAAQICZgMCAgDABAjKZZBb1AJRU +QQQ8EidfzgzBaBIkZHhONOPTASCAeDK%2FEVeQ7RG3t82VDMjQgt2nbAcCzmmiiWt29m1 +wAbDqivIfN11tHVnuaKRxqC64bkVjyCjlNb Cookie2: $Version="1"\r\n \r\n Line-based text data: application/x-www-form-urlencoded hashcode=MIGLBgorBgEEAYI3WAOkoH0wewYKKwYBBAGCN1gDAaBtMGsCAwIAAQICZ +gMCAgDABAgrsahZrvKehgQQanqggVj6eLMO24gaJrq1GwRAF9MwbHHto2HBUAIoZiA3MM +9ZQe97Ag6ob3uGi4N8DHmbFDfVs%2FfBCw3FhriQfFRqAZz99e0mDfmqtvjzJG9ejA%3D +%3D&friendID=107828070&CAPTCHARespo
Looking at the results I notice quite a few things missing from the script's submission from both the GET and POST methods.

I've configured my mechanize object like so:

use WWW::Mechanize; use HTTP::Cookies; my $url = 'http://collect.myspace.com/index.cfm?fuseaction=invite.addf +riend_verify&friendID=107828070'; my $mech = WWW::Mechanize->new(); $mech->cookie_jar( {} ); # login code ... $mech->get($url); # retrieve CAPTCHA image code ... # enter $text code ... $mech->submit_form( form_name => 'addFriend', fields => { CAPTCHAResponse => $text } );
I'm assuming it doesn't work due to the fact that I'm not retrieving the cookies when performing the GET. Does my constructor look correct or am I totally missing something?
Thanks

Replies are listed 'Best First'.
Re: WWW::Mechanize issue
by pemungkah (Priest) on Nov 17, 2006 at 21:44 UTC
    Looks to me like your login code is not working properly, because the cookies don't match. You'll need to look into that first.

    My guess is that you're hitting something that's common but not usually documented: the site looks for "browsers" that aren't one of the recognized user agents, and sends them through a different flow that often does something different than the standard one.

    Try setting $mech->agent_alias('Windows IE 6'); before logging in. Then dump your cookie jar with Data::Dumper and check that you have the same cookies as in your captured with-the-browser example.

    You may also want to supply the button argument to the submit_form() call. This supplies the button that was "clicked" in the POST; the backend may respond differently if it doesn't get this data.

Re: WWW::Mechanize issue
by brian_d_foy (Abbot) on Nov 17, 2006 at 16:57 UTC

    Can you show the actual cookies you get from the original form?

    It looks like the CAPTCHA host (security.myspace.com) is different than the host you're accessing (collect.myspace.com). I suspect something is realizing they aren't the same and deciding not to send the cookies to security.myspace.com. That's just my initial suspicion though, so I'd like to see the domain part of the Set-Cookie value. Can you show the responses in your output too?

    Update: If that manual example is you accessing two site on your own, can you capture the same stuff letting a browser figure it out?

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
      good find...I think you might be right. I currently don't have access to the script, but I'll post a response this evening.
      Actually I take that back. The first set of GET and POST captures are a result of me manually doing it via the browser so they are correct...and my scripts attempt does use the same hosts for the GET and POST.
Re: WWW::Mechanize issue
by jonsmith1982 (Beadle) on Nov 17, 2006 at 17:27 UTC
    forgive me if i'm wrong but i would of thought...
    my $mech = WWW::Mechanize->new(); $mech->cookie_jar( {} );
    is the same as...
    my $mech = WWW::Mechanize->new( cookie_jar => undef );
      No. undef accepts no cookies, while a hashref accepts cookies in an in-memory cookie jar. This is explained in WWW::Mechanize's documentation (the section where new() is explained).
      It is the same
Re: WWW::Mechanize issue
by Anonymous Monk on Nov 17, 2006 at 15:24 UTC
    I'm assuming it doesn't work due to the fact that I'm not retrieving the cookies when performing the GET.
    You are retrieving the cookies (that's how come you send them when performing the POST), you are not sending them when performing the GET.
      hmmm...Any idea why that's not working? I assumed that if I enabled the "cookie_jar" that's a global setting and would work on both methods. Obviously I'm missing something.
        The cookie jar is enabled by default. It has nothing to do with the cookie jar. You need to read a primer on HTTP.
Re: WWW::Mechanize issue
by Anonymous Monk on Nov 17, 2006 at 17:58 UTC
    It looks like the GETs Request Version: HTTP/1.1 is different from the scripts GET (HTTP/1.0). Would that cause an issue, and if so is it possible to tell the mech object to only use 1.1?