in reply to Question concerning HTTP::Request and LWP::UserAgent
I've had a go a restructuring your code, and simplifying some of the logic.
Whilst I have tested most of this in individual parts and it compiles clean with strict and warnings, I don't have a set of proxies available, nor could I find a convenient server doing push. So, this is effectively untested code.
I believe that by using a buffer of 5000 chars, you could well be receiving more than on page at a time. This will be dependant upon the size of the pages and the rate at which the server chooses to push them. Without sight of the content you are receiving, it not possible to do much more as both the size and the rate could be variable between servers or even between pushes from the same server.
With better information, probably easily obtainable by just watching the pages in a browser with a stop watch, you could probably adjust the UA->timeout and UA->max_size parameters, in concert with the sleep controlling the main loop to ensure that you don't miss changes in the pages. Only worth while if that is an important objective.
As mentioned in the comments, checking the response for a X-Content-Range header would allow you to detect buffer over runs should they occur, but this possibility is probably better avoided than fixed.
Hope this is of some benefit to you.
use strict; use warnings; use diagnostics; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use IO::Handle; my $Proxy_Username='admin'; my $Proxy_Password='adminpw'; my @Proxies = ( "http://proxy1:23651/proxy-proxy1-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy2:23651/proxy-proxy2-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy3:23651/proxy-proxy3-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy4:23651/proxy-proxy4-proxy/bin/sitemon?doit HTTP/1.0" ); my @Requests; for my $URL (@Proxies) { # Better outside +the loop. my $Request = HTTP::Request->new(GET => $URL); $Request->referer("http://wizard.yellowbrick.oz"); $Request->authorization_basic($Proxy_Username,$Proxy_Password) +; push @Requests, $request; } STDERR->autoflush(1); STDOUT->autoflush(1); my $UA = LWP::UserAgent->new(); # No point re-c +reating $UA each time in the loop $UA->agent("Mozilla/4.7 [en] (WinNT; I)"); # The parameters + don't change. $UA->timeout(15); # Keep them outsi +de the loop. $UA->max_size(5000); my $delim = '--THIS_STRING_NEVER_HAPPENS'; while (sleep 60) { # Why 1==1 (jus +t 1 would do) but this is better. # No need to call time(), it's the default. # No need to name +it if you only going to print it. print scalar localtime()," --> "; # As I recently l +earnt, the scalar is important. foreach my $Request (@Requests) { # Note: named loo +p counter; not my $Request = $_ inside. my $Response = $UA->request($Request); print "something wrong happened contacting $URL....\n" and next if $Response->is_error(); # Not everyone a +grees with this syntax # everything went ok my $Content = $Response->content(); # Rather than break the data into lines and the loop over the +lines, break out the piece you want # Rather than use a regex for a static string, use index. my $first = index( $Content, $delim , 0 ) + length +$delim; my $second = index( $Content, $delim , $first ); # I'd be tempted to print out the value of $second-$first, so +that you may more closely tailor # the max-size parameter. You could also look for a header of +X-Content-Range which the user agent # adds if the the size of the buffer requested was exceeded (S +ee LWP::UserAgent docs) my $NewContent = substr( $content, $first, $second ); # Now we have a whole page in $new, we can break just this bit + up @NewContentArray=split(/\n/,$Content); # I'll have to take your word that this does what you need it +to do. # The value 14 will possibly need adjusting. if ($NewContentArray[14]=~/(>\s)(\d*)(\s<)/) { print $2," "; # get number of + active processes } } print "\n"; }
|
|---|