Guys, I have a slightly odd problem.

For the past few weeks I have written various scripts which access web sites (carrying monthly energy flow rates etc, since you ask), scrape the data into an array, and add them to a database.

Most of these are a simple application of WWW:Mechanize, sometimes with a little bit of JS which I need to reverse engineer.

I've just come to a tricky one. This one come from, of all places, Chile. Except that I have to translate from spanish is not especially relevant.

This particular form does something tricksy. Not its fault, but seems tricky to navigate past, which is why I need some help. The salient points are:

  1. The action of the form (after relevant inputs are filled in) is a call to a JS routine.
  2. No problem, reverse engineer the javascript. Except, the javascript does this:
  3. It validates a couple of fields, fair enough, but then...
  4. It digs into the forms DOM, and sets the action to a different URL....
  5. ...and then does a form.submit

I've tried grabbing the values of the form and pasting them onto the end of the URL, but no, that produces a database error: I'm guessing there must be something in the asp code which can distinguish between a GET and a POST parameter. I suppose. Never heard of that before but what the hey.

For (I hope) obvious reasons, I can't choose to go to a different website. And the Chilean company isn't going to change their site...

The only thing I can think of to do, is to update the form element's action attribute. In theory I have the form object in:

%self->{mech}->forms[0]

..and I should be able to set this to a new value.. somehow. The code seems to imply it is a URI::_foreign. My first attempt hasn't worked (hey, I expect there's not even an updater method for the attribute at all). Anyway, this doesn't work:

$self->{mech}->form[0]->{action} = bless( do {(my $o = 'http://www.google.com;')}, 'URI::_foreign' );

Has anyone ever tried anything like this? Am I just insane? Advice, please.

Thanks in advance, Andy Holyer, Lewes, UK


In reply to Very obscure WWW:Mechanize problem by andyholyer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.