Bjoern has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks,

I use the following snippet to get the number of currently used netscape proxy server processes from a list of machines. For this purpose, one normally uses a browser and a special URL to display a kind of status table. Now the code works fine, but one question appeared while writing it.

The problem was, that the response that the machines (=the netscape proxy server admin process) sends, never 'ends' or to be more precise, the user agent never stops reading data. It seems to be caused by the content type, which is send by the server as

multipart/x-mixed-replace; boundary=THIS_STRING_NEVER_HAPPENS.

The browser refreshes the page after a few seconds on and on and on but also never really ends getting data.
I worked around it by reading only 5000 bytes and then snipping the text out between two boundaries and then processing it.

I wondered if there was a more elegant way to do it.

Any input very appreciated.

Bjoern

_____
All good things come to those who wait.
--Sam


use strict; use warnings; use diagnostics; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use IO::Handle; my $Proxy_Username='admin'; my $Proxy_Password='adminpw'; my @Proxies = ( "http://proxy1:23651/proxy-proxy1-proxy/bin/sitemon?doit HTTP/ +1.0", "http://proxy2:23651/proxy-proxy2-proxy/bin/sitemon?doit HTTP/ +1.0", "http://proxy3:23651/proxy-proxy3-proxy/bin/sitemon?doit HTTP/ +1.0", "http://proxy4:23651/proxy-proxy4-proxy/bin/sitemon?doit HTTP/ +1.0"); STDERR->autoflush(1); STDOUT->autoflush(1); while (1==1) { my $ReadableTime=localtime(time()); print $ReadableTime," --> "; foreach (@Proxies) { my $URL=$_; my $UA = LWP::UserAgent->new(); # create new user agent $UA->agent("Mozilla/4.7 [en] (WinNT; I)"); # needs to be set like +this to get an answer from the proxy server $UA->timeout(15); # give the user agent some input $UA->max_size(5000); # more input my $Request = HTTP::Request->new(GET => $URL); # create new reques +t $Request->referer("http://wizard.yellowbrick.oz"); # perplex the l +og analysers (stolen code) $Request->authorization_basic($Proxy_Username,$Proxy_Password); # +some more input for the request my $Response = $UA->request($Request); # off goes the request if ($Response->is_error()) { # ups, some error here print "something wrong happened contacting $URL....\n"; } else { # everything went ok my $OffWeGo=0; my $Content = $Response->content(); my @ContentArray; # original array my @NewContentArray; # text between two boundaries @ContentArray=split(/\n/,$Content); # bring on the lines foreach (@ContentArray) { if (($_=~/--THIS_STRING_NEVER_HAPPENS/) && ($OffWeGo==0)) +{ $OffWeGo=1; next; } if (($_=~/--THIS_STRING_NEVER_HAPPENS/) && ($OffWeGo==1)) +{ $OffWeGo=0; last;} if ($OffWeGo==1) { push (@NewContentArray,$_); } } if ($NewContentArray[14]=~/(>\s)(\d*)(\s<)/) {print $2," "; } +# get number of active processes } } print "\n"; sleep (60); }

Edit kudra, 2002-09-11 Added a READMORE before the code

Replies are listed 'Best First'.
Re: Question concerning HTTP::Request and LWP::UserAgent
by BrowserUk (Patriarch) on Sep 12, 2002 at 04:22 UTC

    I've had a go a restructuring your code, and simplifying some of the logic.

    Whilst I have tested most of this in individual parts and it compiles clean with strict and warnings, I don't have a set of proxies available, nor could I find a convenient server doing push. So, this is effectively untested code.

    I believe that by using a buffer of 5000 chars, you could well be receiving more than on page at a time. This will be dependant upon the size of the pages and the rate at which the server chooses to push them. Without sight of the content you are receiving, it not possible to do much more as both the size and the rate could be variable between servers or even between pushes from the same server.

    With better information, probably easily obtainable by just watching the pages in a browser with a stop watch, you could probably adjust the UA->timeout and UA->max_size parameters, in concert with the sleep controlling the main loop to ensure that you don't miss changes in the pages. Only worth while if that is an important objective.

    As mentioned in the comments, checking the response for a X-Content-Range header would allow you to detect buffer over runs should they occur, but this possibility is probably better avoided than fixed.

    Hope this is of some benefit to you.

    use strict; use warnings; use diagnostics; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use IO::Handle; my $Proxy_Username='admin'; my $Proxy_Password='adminpw'; my @Proxies = ( "http://proxy1:23651/proxy-proxy1-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy2:23651/proxy-proxy2-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy3:23651/proxy-proxy3-proxy/bin/sitemon?doit HTTP/1.0" +, "http://proxy4:23651/proxy-proxy4-proxy/bin/sitemon?doit HTTP/1.0" ); my @Requests; for my $URL (@Proxies) { # Better outside +the loop. my $Request = HTTP::Request->new(GET => $URL); $Request->referer("http://wizard.yellowbrick.oz"); $Request->authorization_basic($Proxy_Username,$Proxy_Password) +; push @Requests, $request; } STDERR->autoflush(1); STDOUT->autoflush(1); my $UA = LWP::UserAgent->new(); # No point re-c +reating $UA each time in the loop $UA->agent("Mozilla/4.7 [en] (WinNT; I)"); # The parameters + don't change. $UA->timeout(15); # Keep them outsi +de the loop. $UA->max_size(5000); my $delim = '--THIS_STRING_NEVER_HAPPENS'; while (sleep 60) { # Why 1==1 (jus +t 1 would do) but this is better. # No need to call time(), it's the default. # No need to name +it if you only going to print it. print scalar localtime()," --> "; # As I recently l +earnt, the scalar is important. foreach my $Request (@Requests) { # Note: named loo +p counter; not my $Request = $_ inside. my $Response = $UA->request($Request); print "something wrong happened contacting $URL....\n" and next if $Response->is_error(); # Not everyone a +grees with this syntax # everything went ok my $Content = $Response->content(); # Rather than break the data into lines and the loop over the +lines, break out the piece you want # Rather than use a regex for a static string, use index. my $first = index( $Content, $delim , 0 ) + length +$delim; my $second = index( $Content, $delim , $first ); # I'd be tempted to print out the value of $second-$first, so +that you may more closely tailor # the max-size parameter. You could also look for a header of +X-Content-Range which the user agent # adds if the the size of the buffer requested was exceeded (S +ee LWP::UserAgent docs) my $NewContent = substr( $content, $first, $second ); # Now we have a whole page in $new, we can break just this bit + up @NewContentArray=split(/\n/,$Content); # I'll have to take your word that this does what you need it +to do. # The value 14 will possibly need adjusting. if ($NewContentArray[14]=~/(>\s)(\d*)(\s<)/) { print $2," "; # get number of + active processes } } print "\n"; }

    Well It's better than the Abottoire, but Yorkshire!