andyholyer has asked for the wisdom of the Perl Monks concerning the following question:

Guys, I have a slightly odd problem.

For the past few weeks I have written various scripts which access web sites (carrying monthly energy flow rates etc, since you ask), scrape the data into an array, and add them to a database.

Most of these are a simple application of WWW:Mechanize, sometimes with a little bit of JS which I need to reverse engineer.

I've just come to a tricky one. This one come from, of all places, Chile. Except that I have to translate from spanish is not especially relevant.

This particular form does something tricksy. Not its fault, but seems tricky to navigate past, which is why I need some help. The salient points are:

  1. The action of the form (after relevant inputs are filled in) is a call to a JS routine.
  2. No problem, reverse engineer the javascript. Except, the javascript does this:
  3. It validates a couple of fields, fair enough, but then...
  4. It digs into the forms DOM, and sets the action to a different URL....
  5. ...and then does a form.submit

I've tried grabbing the values of the form and pasting them onto the end of the URL, but no, that produces a database error: I'm guessing there must be something in the asp code which can distinguish between a GET and a POST parameter. I suppose. Never heard of that before but what the hey.

For (I hope) obvious reasons, I can't choose to go to a different website. And the Chilean company isn't going to change their site...

The only thing I can think of to do, is to update the form element's action attribute. In theory I have the form object in:

%self->{mech}->forms[0]

..and I should be able to set this to a new value.. somehow. The code seems to imply it is a URI::_foreign. My first attempt hasn't worked (hey, I expect there's not even an updater method for the attribute at all). Anyway, this doesn't work:

$self->{mech}->form[0]->{action} = bless( do {(my $o = 'http://www.google.com;')}, 'URI::_foreign' );

Has anyone ever tried anything like this? Am I just insane? Advice, please.

Thanks in advance, Andy Holyer, Lewes, UK

Replies are listed 'Best First'.
Re: Very obscure WWW:Mechanize problem
by Corion (Patriarch) on Jan 08, 2012 at 20:12 UTC

    The WWW::Mechanize documentation says that all things returned by the ->forms methods are HTML::Form objects. And searching them for "action" returns the ->action accessor.

    Maybe that one helps you to emulate (re)setting the action to something different?

      That's what I've got up to myself. What doesn't seem to be working is getting the syntax gnarly enough to actually work.

      *Sigh* Thanks, I'll keep digging...

        It seems unapproachable if you keep calling it obscure/gnarly...

        #!/usr/bin/perl -- use strict; use warnings; use WWW::Mechanize; my $ua = WWW::Mechanize->new(); use URI::file; $ua->get( URI::file->new( __FILE__ )->abs(URI::file->cw +d) ); #~ $ua->get( 'file:'. __FILE__ ); $ua->add_handler("request_send", sub { shift->dump; return }); $ua->add_handler("response_done", sub { shift->dump; return }); $ua->timeout( 1 ); $ua->update_html( <<'HTML','.');## $ua->{content} = <<'HTML'; <html> <head> <title> localhost form </title> </head> <body> <base href="http://localhost/"> <form method="POST" action="http://localhost/"> <input id="enterbutton" type="submit" name="user_choice" value="Enter" + /> <input type="submit" name="user_choice" value="Leave" /> </form> </body> </html> HTML my $form = $ua->form_number( 0 ); $form->action( $form->action . 'THE_NEW_ACTION/' ); $ua->submit; __END__ $ perl mechanize.inline.form.action.pl POST http://localhost/THE_NEW_ACTION/ Accept-Encoding: gzip Referer: file:mechanize.inline.form.action.pl User-Agent: WWW-Mechanize/1.71 Content-Length: 0 Content-Type: application/x-www-form-urlencoded (no content) 500 Can't connect to localhost:80 (timeout) Content-Type: text/plain Client-Date: Sun, 08 Jan 2012 21:03:56 GMT Client-Warning: Internal response Can't connect to localhost:80 (timeout)\n LWP::Protocol::http::Socket: connect: timeout at C:/perl/site/5.14.1/l +ib/LWP/Protocol/http.pm line 51.\n Error POSTing http://localhost/THE_NEW_ACTION/: Can't connect to local +host:80 (timeout) at mechanize.inline.form.action.pl line 39