Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to simulate a login to a web page through a perl script. I'm using LWP::UserAgent, HTTP::Cookies, and HTTP::Request. The problem is that the login for the web site occurs in two phases (two separate web pages). In the first phase you enter in your username and password info and the second phase you select where you want to be redirected. I can get the redirection page just fine (in other words, my perl script handles the login and password page just fine - also I can get to other restricted pages on the site so I know that the perl script is handling the login page correctly).

The problem with the second web page (where the user selects where he wants to be redirected) is that there are about 30 hidden form fields on the page that are submitted when you click the submit button. My perl script needs to be able to pass these hidden form fields and there values to the server. Is there any efficient way (using CGI.pm or some module like that) that would grab all form data and allow me to just pass it through?

Thanks.

Replies are listed 'Best First'.
(jeffa) Re: Parsing Web Page
by jeffa (Bishop) on Jan 20, 2003 at 18:39 UTC
    I would attack this puppy with WWW::Mechanize.

    UPDATE:
    here is some sample code (one CGI script and one command-line script) to demonstrate how WWW::Mechanize will "fill-in" the hidden values for you:

    (foo.cgi) #!/usr/bin/perl -T use strict; use warnings; use CGI qw(:standard); use Data::Dumper; print header,start_html; if (param('go')) { print ul(li[map {"$_ = ". param($_)} param()]); } else { print start_form, textfield('foo'), hidden(-name=>'hidden1',-value=>'one'), hidden(-name=>'hidden2',-value=>'two'), hidden(-name=>'hidden3',-value=>'three'), submit('go'), end_form, ; } ---------------------------------------------------- (foo.pl) #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $agent = WWW::Mechanize->new(); $agent->get('http://localhost/path/to/foo.cgi'); $agent->form(1); $agent->field('foo','bar'); $agent->click('go'); print $agent->{content};

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Parsing Web Page
by hardburn (Abbot) on Jan 20, 2003 at 18:27 UTC

    Check the HTML:: namespace under CPAN. There are a bunch of HTML parsers in there, though I can't really recommend one since I've never used them before.

Re: Parsing Web Page
by Fletch (Bishop) on Jan 20, 2003 at 18:32 UTC

    See also Perl and LWP (ISBN ISBN 0596001789) for more than you ever could want to know about screenscraping HTML.

Re: Parsing Web Page
by Gilimanjaro (Hermit) on Jan 20, 2003 at 19:44 UTC
    Smells like a job for HTML::Treebuilder! Which in turn uses HTML::Parser, which in turn uses HTML::Element, which in turn provides you with the ridiculously powerfull look_down method... Assuming $content contains the page you got back;

    use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; $tree->parse($content); $tree->eof; my @hidden_field_elements = $tree->look_down( _tag => "input", type => "hidden", ); my %hidden_fields = map { $_->attr('name') => $_attr('value') } @hidden_field_elements;

    This might work as it is, or it may have some typos in there. You get the <IMG> tag though... Uh... picture.

    I must recommend that you not use this method though, because it makes solving this challenge so easy it can only be considered cheating. And you don't wanna be a cheat right?

    The man-pages for HTML::Element, ::Parser and ::TreeBuilder are also criminally complete and can easily be used to convince any PHP'r that he's barking up the wrong (pear)tree.

    Happy coding!

      Would the minus-voters on my response please tell me what is wrong with my post? Okay, it may be a bit jovial, but the solution I offer works, and even after rereading the original problem several times I think this solution fits the proposition 100%... In fact, I used this solution myself for almost exactly the same problem...

      Maybe I upset a few PHP'rs?

        Your post was ok until you mentioned PHP when it become flamebait. No, I'm not PHP fan. No I didn't downvoted you because you actually have good answer. But trolling is inappropriate for perlmonks and many people will downvote you even if you have good answer.

        --
        Ilya Martynov, ilya@iponweb.net
        CTO IPonWEB (UK) Ltd
        Quality Perl Programming and Unix Support UK managed @ offshore prices - http://www.iponweb.net
        Personal website - http://martynov.org