ady has asked for the wisdom of the Perl Monks concerning the following question:
Here's a trace of my POST to the server, -- it doesn't work.server: http://rswatch page1: /RSData.aspx # form with field 'miljoe' page2: /RSData.aspx?miljoe=UDV # form with field ''TextBoxProductID' +and button 'Button1'
POST /RSData.aspx?miljoe=UDV HTTP/1.1 TE: deflate,gzip;q=0.3 Connection: TE Authorization: Basic S01EXHo2YW5kOno2YW5keXl5 Host: rswatch User-Agent: libwww-perl/5.805 Content-Length: 151 Content-Type: application/x-www-form-urlencoded DropDownListType=-TextBoxGUID&-=TextBoxUserName&-=TextBoxKommunenr&-=T +extBoxProductID&KMD.NI.DPSagsbehandler=TextBoxShortText&-=Button1&Opd +ater+filter= HTTP/1.1 200 OK Date: Mon, 25 Dec 2006 14:25:36 GMT Server: Microsoft-IIS/6.0 MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Set-Cookie: ASP.NET_SessionId=4dbkrgn4idtdwbuptotubemu; path=/ Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 11883 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > <HTML> <HEAD> <span id="Label2"><title>RSWatch - UDV</title></span> <meta content="Microsoft Visual Studio .NET 7.1" name="GENERAT +OR"> <meta content="C#" name="CODE_LANGUAGE"> <meta content="JavaScript" name="vs_defaultClientScript"> <meta content="http://schemas.microsoft.com/intellisense/ie5" +name="vs_targetSchema"> <LINK href="StyleSheet1.css" type="text/css" rel="stylesheet"> </HEAD> <body> <center> <table class="BodyTable"> <tr> <td class="TDheaderUnderline"><A href="default.asp +x">RSWatch</A> - <span id="Label1">UDV</span><a name="top">&nbs +p;</a></td> </tr> <tr> <td class="BodyTable"> <form name="Form1" method="post" action="RSDat +a.aspx?miljoe=UDV" id="Form1"> <input type="hidden" name="__VIEWSTATE" value="dDwxMTA1MDg5NDkzO3Q8O2w +8aTwxPjtpPDM+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPFw8dGl0bGVcPlJTV2F0Y2ggLSBV +RFZcPC90aXRsZVw+Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDxVRFY7Pj47Pjs7Pjs+P +js+QtBaNAQOnC4Eqk2prlcPA4K8wqw=" /> <table class="noborder"> <tr> <td class="noborder"><a id="HyperL +ink2" title="forrige" href="/RSData.aspx?Miljoe=UDV&StartFejllogI +d=3118741"><--</a> <a id="HyperLink1" title="n..s +te" href="/RSData.aspx?Miljoe=UDV&StartFejllogId=3118781">--></a> + <a id="HyperLink3" title="til +top" href="/RSData.aspx?Miljoe=UDV&StartFejllogId=2147483647">--> +></a></td> <td class="noborder"> & +nbsp; </td> <td class="noborder"><select name= +"DropDownListType" id="DropDownListType"> <option selected="selected" value="-">-</option> <option value="E">E</option> <option value="S">S</option> <option value="W">W</option> <option value="R">R</option> <option value="T">T</option> </select></td> <td class="noborder"><input name=" +TextBoxGUID" type="text" value="-" id="TextBoxGUID" /></td> <td class="noborder"><input name=" +TextBoxUserName" type="text" value="-" id="TextBoxUserName" /></td> <td class="noborder"><input name=" +TextBoxKommunenr" type="text" value="-" id="TextBoxKommunenr" /></td> <td class="noborder"><input name=" +TextBoxProductID" type="text" value="-" id="TextBoxProductID" /></td> <td class="noborder"><input name=" +TextBoxShortText" type="text" value="-" id="TextBoxShortText" /></td> <td class="noborder"><input type=" +submit" name="Button1" value="Opdater filter" id="Button1" /></td> </tr> </table> </form> <!--table content cut out here --> </body> </HTML>
POST /RSData.aspx?miljoe=UDV HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, applicati +on/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, a +pplication/x-shockwave-flash, */* Referer: http://rswatch/RSData.aspx?miljoe=UDV Accept-Language: da Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .N +ET CLR 1.1.4322; InfoPath.1) Host: rswatch Content-Length: 0 Connection: Keep-Alive Cache-Control: no-cache Cookie: ASP.NET_SessionId=0i4zi0q0uag51lypzvg4m0va Authorization: Negotiate TlRMTVNTUAABAAAAB4IIogAAAAAAAAAAAAAAAAAAAAAFA +SgKAAAAD0== HTTP/1.1 401 Unauthorized Content-Length: 83 Content-Type: text/html Server: Microsoft-IIS/6.0 WWW-Authenticate: Negotiate TlRMTVNTUAACAAAABgAGADgAAAAFgomixPDhPomZ5s +YAAAAAAAAAAI4AjgA+AAAABQLODgAAAA9LAE0ARAACAAYASwBNAEQAAQAQAE8ARABTAFc +ARQBCADAAMQAEABoAaQBuAHQAZQByAG4ALgBrAG0AZAAuAGQAawADACwATwBEAFMAVwBF +AEIAMAAxAC4AaQBuAHQAZQByAG4ALgBrAG0AZAAuAGQAawAFABoAaQBuAHQAZQByAG4AL +gBrAG0AZAAuAGQAawAAAAAA MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET Date: Mon, 25 Dec 2006 17:43:04 GMT <html><head><title>Error</title></head><body>Error: Access is Denied.< +/body></html> POST /RSData.aspx?miljoe=UDV HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, applicati +on/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, a +pplication/x-shockwave-flash, */* Referer: http://rswatch/RSData.aspx?miljoe=UDV Accept-Language: da Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .N +ET CLR 1.1.4322; InfoPath.1) Host: rswatch Content-Length: 368 Connection: Keep-Alive Cache-Control: no-cache Cookie: ASP.NET_SessionId=0i4zi0q0uag51lypzvg4m0va Authorization: Negotiate TlRMTVNTUAADAAAAGAAYAGQAAAAYABgAfAAAAAYABgBIA +AAACgAKAE4AAAAMAAwAWAAAAAAAAACUAAAABYKIogUBKAoAAAAPSwBNAEQAWgA2AEEATg +BEAEgAMgA0ADkANgA0AG2gazXZgVp0AAAAAAAAAAAAAAAAAAAAAC4sefx6XWUzFigAY3I +xHngpT+49JULFTA== __VIEWSTATE=dDwxMTA1MDg5NDkzO3Q8O2w8aTwxPjtpPDM%2BOz47bDx0PHA8cDxsPFRl +eHQ7PjtsPFw8dGl0bGVcPlJTV2F0Y2ggLSBVRFZcPC90aXRsZVw%2BOz4%2BOz47Oz47d +DxwPHA8bDxUZXh0Oz47bDxVRFY7Pj47Pjs7Pjs%2BPjs%2BQtBaNAQOnC4Eqk2prlcPA4 +K8wqw%3D&DropDownListType=-&TextBoxGUID=-&TextBoxUserName=-&TextBoxKo +mmunenr=-&TextBoxProductID=-&TextBoxShortText=KMD.NI.DPSagsbehandler& +Button1=Opdater+filter HTTP/1.1 200 OK Date: Mon, 25 Dec 2006 17:43:18 GMT Server: Microsoft-IIS/6.0 MicrosoftOfficeWebServer: 5.0_Pub X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 11823 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" > <HTML> <HEAD> <span id="Label2"><title>RSWatch - UDV</title></span> <meta content="Microsoft Visual Studio .NET 7.1" name="GENERAT +OR"> <meta content="C#" name="CODE_LANGUAGE"> <meta content="JavaScript" name="vs_defaultClientScript"> <meta content="http://schemas.microsoft.com/intellisense/ie5" +name="vs_targetSchema"> <LINK href="StyleSheet1.css" type="text/css" rel="stylesheet"> </HEAD> <body> <center> <table class="BodyTable"> <tr> <td class="TDheaderUnderline"><A href="default.asp +x">RSWatch</A> - <span id="Label1">UDV</span><a name="top">&nbs +p;</a></td> </tr> <tr> <td class="BodyTable"> <form name="Form1" method="post" action="RSDat +a.aspx?miljoe=UDV" id="Form1"> <input type="hidden" name="__VIEWSTATE" value="dDwxMTA1MDg5NDkzO3Q8O2w +8aTwxPjtpPDM+Oz47bDx0PHA8cDxsPFRleHQ7PjtsPFw8dGl0bGVcPlJTV2F0Y2ggLSBV +RFZcPC90aXRsZVw+Oz4+Oz47Oz47dDxwPHA8bDxUZXh0Oz47bDxVRFY7Pj47Pjs7Pjs+P +js+QtBaNAQOnC4Eqk2prlcPA4K8wqw=" /> <table class="noborder"> <tr> <td class="noborder"><a id="HyperL +ink2" title="forrige" href="/RSData.aspx?Miljoe=UDV&StartFejllogI +d=2769544"><--</a> <a id="HyperLink3" title="til +top" href="/RSData.aspx?Miljoe=UDV&StartFejllogId=2147483647">--> +></a></td> <td class="noborder"> & +nbsp; </td> <td class="noborder"><select name= +"DropDownListType" id="DropDownListType"> <option selected="selected" value="-">-</option> <option value="E">E</option> <option value="S">S</option> <option value="W">W</option> <option value="R">R</option> <option value="T">T</option> </select></td> <td class="noborder"><input name=" +TextBoxGUID" type="text" value="-" id="TextBoxGUID" style="width:150p +x;" /></td> <td class="noborder"><input name=" +TextBoxUserName" type="text" value="-" id="TextBoxUserName" style="wi +dth:80px;" /></td> <td class="noborder"><input name=" +TextBoxKommunenr" type="text" value="-" id="TextBoxKommunenr" style=" +width:80px;" /></td> <td class="noborder"><input name=" +TextBoxProductID" type="text" value="-" id="TextBoxProductID" style=" +width:100px;" /></td> <td class="noborder"><input name=" +TextBoxShortText" type="text" value="KMD.NI.DPSagsbehandler" id="Text +BoxShortText" /></td> <td class="noborder"><input type=" +submit" name="Button1" value="Opdater filter" id="Button1" /></td> </tr> </table> </form> <!--table content cut out here --> </body> </HTML>
### Arg parsing, Initialization, IO setup cut out here... ### ================================================================== +==== ### do_POST -- Params: ### the URL, (odsweb01.kmd.dk [172.31.88.103]: http://rswatch/RSData. +aspx) ### an arrayref or hashref for the key/value pairs, ### optionally: any header lines: (key,value, key,value) ### ================================================================== +==== sub do_POST { if ( ! $ua ) { $ua = new LWP::UserAgent(keep_alive=>1,parse_head=>0); $ua->credentials('rswatch:80', 'rswatch', "KMD\\z6and", 'xxxxxxx +'); $ua->default_header('Referer' => "http:\/\/rswatch\/RSData.aspx? +miljoe=$args{E}"); $ua->default_header('Accept-Language' => 'da'); push @{$ua->requests_redirectable}, 'POST'; $ua->cookie_jar( {} ); $ua->env_proxy(); } my $resp = $ua->post(@_); return ($resp->content, $resp->status_line, $resp->is_success, $res +p) if wantarray; return unless $resp->is_success; return $resp->content; } ### ================================================================== +==== ### do_RSbase : Parse RSwatch DB by traversing <-- ('forrige') link ch +ain ### ================================================================== +==== ### Termination: sub not_interesting :'$done' when ($S < $args{T}), cf +. ### sub set_args : $tw = "20051103151100"; # 1.log date sub do_RSbase { # Start in Browsing mode $browsing = 1; print "Browsing page:\n"; # Parse 1.st and previous pages, until done my $previous = "http://rswatch/RSData.aspx?miljoe=UDV"; for (my $p = 1; !$done; ) { print ">" . $p++ . "\n"; usleep ($args{S}); # Pause and... $previous = do_page($previous); # parse previous + page. } } ### ================================================================== +==== ### do_page : Parse RSWatch page ### ================================================================== +==== sub do_page { # --- Fetch page (1.page & back-links) my $url = shift; my @parms = []; =cut # this doesn't work... my @parms = [ 'TextBoxProductID'=> 'KMD.NI.DPSagsbehandler', 'Button1' => 'Opdater filter', ]; =cut my ($content, $message, $is_success) = do_POST("$url", @parms); die "***ERROR: HTTP to $url:\r\n\t$message\n" unless $is_success; #print "$content\n\n"; # --- Decode & Parse page my $root = HTML::TreeBuilder->new; $content = decode("utf8", $content); $root->parse($content); # --- Extract page backlink my $node_prev = $root->find_by_attribute("id", "HyperLink2"); my $link_prev = $node_prev->attr("href"); # --- Process main log table my @tables = $root->find_by_tag_name('table'); my @table_rows = $tables[2]->find_by_tag_name('tr'); do_summary(\@table_rows); # --- Free parse resources $root->eof; #$root->dump; $root->delete; # --- Return link to previous page return "http://rswatch/" . $link_prev; # or 0, if last page! } ### ================================================================== +==== ### do_summary : Parse RSWatch log summary table ### ================================================================== +==== ### ------------------------------------------------------------------ +---- ### Raise flags: !browsing if past -f(rom); $done if past -t(o). sub not_interesting { my $r_table_cells = shift; my @table_cells = @{$r_table_cells}; my $S = ($table_cells[4]->as_text); $S =~ s/[-: ]//g; if ($S > $args{F}) { $browsing ||=1; return 1;} # Before from.. sk +ip if ($args{T} > $S) { $done = 1; return 1;} # After to... qu +it if ($browsing) { $browsing = 0; print "\n"; } # 0: Interesting! return; } ### ------------------------------------------------------------------ +---- ### Parse each log $row to @log_record table on page sub do_summary { my $r_table_rows = shift; # ref param my @table_rows = @{$r_table_rows}; # cast to array shift(@table_rows); # discard header row ROW: # --- Process each <ProductID> $row to @log_record foreach my $row (@table_rows) { return if $done; my @log_record; my @table_cells = $row->find_by_tag_name('td'); if ( exists($table_cells[5]) && $table_cells[5]->as_text=~/DPSagsbehandler/i ) # TODO:read fr +om config { # --- If interesting: build @log_record from HTML next ROW if not_interesting(\@table_cells); # Skip out-of-bo +unds foreach my $cell (@table_cells) { push @log_record, $cell->as +_text; } # --- If E(rror): process row detailsand push on @log_record my $type = $table_cells[1]->as_text; # [E(rror)|S|W|R|T] if ($type =~ /E/i) { my $detail_link = "http://rswatch/" . $table_cells[0]->find_by_tag_name('a')->attr('href'); my $details = do_details($detail_link); push @log_record, $details; } # --- Reformat and print @log_record to file (tee to STDOUT) print_record(\@log_record); } } } ### ================================================================== +==== ### do_details : Parse RSWatch details ### ================================================================== +==== sub do_details { # --- Fetch details page for $url my $url = shift; my ($content, $message, $is_success) = do_POST("$url", []); die "***ERROR: POST to $url:\r\n\t$message\n" unless $is_success; # --- Decode & Parse page my $root = HTML::TreeBuilder->new; $content = decode("utf8", $content); $root->parse($content); # --- Retrieve details text my @tables = $root->find_by_tag_name('table'); my @table_rows = $tables[3]->find_by_tag_name('tr'); shift (@table_rows); # discard table header my $details = $table_rows[0]->find_by_attribute("valign", "top")->a +s_text(); # --- Free parse resources $root->eof; #$root->dump; #print "\tSUMMARY: $url\n"; $root->delete; return $details; } ### ================================================================== +==== ### print_record : Print one log record ### ================================================================== +==== sub print_record { my $r_log_record = shift; # ref param my @log_record = @{$r_log_record}; # cast to array # --- Reformat log record $log_record[4] =~ s/ /#/; # seperate date,time in T +imeStamp for my $i (1..2) { shift(@log_record); } # discard FejllogId & Typ +e my @print_record; push @print_record, split('#', $log_record[2]); # TimeStamp date an +d time push @print_record, "<TYPE>"; # DPxxx -- Fill in push @print_record, $log_record[1]; # Municipality No. push @print_record, $log_record[0]; # User ID # --- Parse ShortText ### TO-BE-DONE ### push @print_record, "<S[EX][OF]"; # Service Exity´|eX +it,Ok|False push @print_record, $log_record[5]; # ShortText # --- Print record @print_record = map { "$_," } @print_record; # To CSV format... my $print_record = "@print_record"; # - flatten $print_record =~ s/\s*//g; # - zap whitespace print $t "$print_record\n"; # - tee out! } ### ================================================================== +==== ### MAIN ### ================================================================== +==== ### Init set_args(); $t1 = time(); print scalar localtime,"\n"; initialize(); ### Extract do_RSbase(); ### Cleanup flock(OF,LOCK_UN); close(OF); $t2 = time(); print "\n", scalar localtime,"\n"; my ($h,$m,$s) = (localtime($t2-$t1))[2,1,0]; print "Elapsed: $m:$s\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: LWP POST to a form on a 'secondary web page'
by andyford (Curate) on Dec 25, 2006 at 19:02 UTC | |
by ady (Deacon) on Dec 25, 2006 at 20:35 UTC | |
|
Re: LWP POST to a form on a 'secondary web page'
by ForgotPasswordAgain (Vicar) on Dec 25, 2006 at 18:54 UTC |