hhalpin has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I have an online database that I retrieve info from all the time, and now I desperately needed to automate the task. However, I am not very familiar with the LWP module and can not get this to work. Simply put, I have a web page in which I need to initialize a SELECT, write a number to a text box, write another string to a textarea, and finally click and button and retrieve the results off the web. I know you would use something like this, but read my comments and you'll see I have lots of questions.
$agent = LWP::UserAgent->new(); #initialize agent my $req = POST 'http://www.database.arg', [ ]#HELP! what do I put # inside those # brackets! $content = $agent-request($req)->as_string; #now I want to retrieve the web page the #POST sends me to - and download it to #save it as a file to peruse at my leisure. #I don't know how to do this either!
To help you answer my question, here's the HTML source of the web-page I want to automate - changed to protect the innocent. I put the comments in front of the lines for safety and paranoia.
#<HTML> # <HEAD><TITLE>Database</TITLE></HEAD> # # <HR> # <FORM METHOD="POST" ACTION="http:/cgi-bin/database-#result.html +"> # <P> #<TABLE NOBORDER> #<TR><TD ALIGN=LEFT>Select a subject: # <TD ALIGN=LEFT COLSPAN=2><SELECT #NAME=subjectspace> #<OPTION> option 1 #<OPTION> option 2 #<OPTION> option 3 #<OPTION SELECTED> option 4 default #<OPTION> option 5 #</SELECT> # <BR><TR><TD> #<TR><TD ALIGN=LEFT>Number of documents:<BR> # <TD ALIGN=LEFT><INPUT TYPE=TEXT SIZE=5 #NAME=SubjectDocs +> #</TABLE> # <P> # Texts to search for (separate different sentences #with a punct +uation):<BR> # TEXTAREA NAME="txt1" WRAP="SOFT" ROWS=12 COLS=50> # /TEXTAREA> # <P> # INPUT TYPE="submit" VALUE="Submit Search"> # INPUT TYPE="reset" VALUE=" Reset to Defaults "> #<BR> # </FORM> # <P> # </BODY> # </HTML>
Hope this helps. If you have any questions feel free to e-mail me at hhalpin@email.unc.edu. I'm really confused and will probably eventually go mad if this doesn't automate - so your help will be vastly appreciated! I also had a helluva time posting it - the only way I could get it to work was to take a '<' away from in front of TEXTAREA and the two INPUTs.

Thanks, Harry

Replies are listed 'Best First'.
Re: Web Automation and POST Confusing - need help
by chromatic (Archbishop) on Nov 30, 2000 at 03:42 UTC
    It appears that you've read 'perldoc lwpcook' already, which is good. There are two things you need to know to finish this task successfully.

    First, you pass CGI parameters as name=value pairs. Think of it like a hash, and you'll have the truth of it.

    Second, you only need to know the name of fields and the types of information they contain. You can divine that by reading the page source. For any input, you will likely see a 'name' field. The value of that attribute is the key, and the value you want to assign to that field is the value.

    For example:

    my $req = POST 'http://www.database.arg', [ subjectspace => 'option 4 default', SubjectDocs => 10, txt1 => $text ]; $content = $agent-request($req)->as_string;
    This will give you the resulting page in $content, suitable for saving or printing.
Re: Web Automation and POST Confusing - need help
by clemburg (Curate) on Nov 30, 2000 at 16:16 UTC

    These are some routines from a simple script to test a search engine. Since I am not able to post the whole script (sorry, sensitive data), I deleted the sensitive parts and retained what I thought of as the core routines in the context of a script skeleton.

    The original script proceeds as follows: It fetches a web page that contains several boxes with select options. It parses out the name and options for each box and submits a request for each box and each option, printing out a simple OK / NO RESULTS statement for it, using a call like print format_results($box, $option, $action).

    Since there is nothing special about a select box with just one option to select in terms of CGI parameters, these routines should generalize easily to other field types.

    Hope to have been of help.

    #!/usr/bin/perl -w use strict; # -------------------------------------------------- # fragments from a web testing script # to give away some helper routines # and to show use of the LWP module # Author: Christian Lemburg, 2000-11-30 # -------------------------------------------------- use Getopt::Std; use LWP::UserAgent; use HTTP::Request; use URI::Escape; # -------------------------------------------------- # setup and globals # -------------------------------------------------- $| = 1; my %opts; getopts('u:p:d:v', \%opts); my $user = $opts{'u'} || 'foo'; my $password = $opts{'p'} || 'bar'; my $VERBOSE = $opts{'v'}; my $agent_delay = $opts{'d'} || 2; # [ ... snip ... ] # other argument processing - whatever you need my $ua = LWP::UserAgent->new(); $ua->agent('YourAgent/1.0'); $ua->env_proxy(); # -------------------------------------------------- # action # -------------------------------------------------- # [ ... snip ... ] # in here, set: # 1) $action to the URL of a search script # to call, with ###BOX### as a placeholder for the name # of the select box and ###OPTION### as a placeholder for # the value of the selected option, # 2) $box to the name parameter of the select box, # 3) $option to the value of the selected option, # then call format_results($box, $option, $action) # to output a statement on the result of a search # -------------------------------------------------- # subs # -------------------------------------------------- sub format_results { my ($box, $option, $action) = @_; my $box_param = uri_escape($box); my $option_param = uri_escape($option); $action =~ s|###BOX###|$box_param|g; $action =~ s|###OPTION###|$option_param|g; my $result = get_result($action); return "Selection Box '" . $box . "'" . ", Option '$option': " . $result . "\n"; } sub get_result { my ($action) = @_; sleep($agent_delay); print "Sub Agent: Processing $action ... \n" if $VERBOSE; my $sub_ua = LWP::UserAgent->new(); $sub_ua->agent('YourAgent/1.0'); $sub_ua->env_proxy(); my $request = HTTP::Request->new('GET', $action); $request->authorization_basic($user, $password); my $response = $sub_ua->request($request); if ($response->is_success) { my $html = $response->content; return evaluate_search_result($html); } else { return 'ERROR'; } } sub evaluate_search_result { my ($html) = @_; if ($html =~ m|$no_results_indicator{$index_language}|) { return 'NO RESULTS'; } elsif ($html =~ m|$have_results_indicator{$index_language}|) { return 'OK'; } else { return 'UNCERTAIN - CHECK RESULT INDICATORS'; } } sub usage { return << "EOU"; Usage: $0 [-u][-p][-d][-v] args Options: u: user - name of user for htaccess authentification p: password - password for htaccess authentification d: delay - delay between page fetches in seconds, default 2 v: verbose output Note: If you are located behind a firewall, please set the 'http_proxy' environment variable to something like 'http://myproxy.mydomain.com:myport'. EOU }

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com