in reply to conditional testing for error 500 webpages before following them?
When you are the human, you click again and this one out of 2,000 requests just doesn't even register in your brain. If I have have to run 8,000 requests, then it matters...
Here is some code that you can adapt:
You should do a "retry" before deciding that this is a "dead link". I show one way below. This server barfs with error 500 or whatever about 1/2000 requests.
The RETRY skips the (while) statement and continues on with a new GET. I don't bother to "skip around" the "clean-up" code before the GET because it runs really fast and again only happens 1/2000 times.
Hope this idea helps you. This is real world stuff that does happen. I sleep a little bit to "be nice". This code works with a "paid subscription" and I am not as "nice" as I would be if this was a free interface. but even so I am a little nice when the server "barfs".
The main point here is the use of RETRY: (which is my label) and redo which is the Perl keyword.
RETRY: while (my $n_attempt=0, my $callsign=<>) { $callsign = uc($callsign); # uppercase $callsign =~ s/^\s*//; # no leading spaces $callsign =~ s/\s*$//; # no trailing spaces does chomp() also +.. next if $callsign eq ""; # skip NULL (blank lines)! my $callsign = (split(/,/,$callsign))[0]; #allow histogram format #w6oat,234 or just w6oat next if ($callsign =~ /^[a-zA-Z]\d{1}[a-zA-Z]$/); # like N7A # NO PROCESSING OF 1X1 US CALLSIGNS!!! print STDERR "working on $callsign\n" if DEBUG; my $req = GET "http://www.qrz.com/xml?s=$key;callsign=$callsign"; my $res = $ua->request($req); unless ($res->is_success) { $n_attempt++; print STDERR "$callsign ERROR: Try# $n_attempt of ".MAX_RETRY. " err:". $res->status_line ."\n"; sleep(1); redo RETRY if $n_attempt <= MAX_RETRY; print STDERR "$callsign ERROR: Try# $n_attempt of ". MAX_RETRY." FAILED: ". $res->status_line . "\n"; next; # skip this callsign and go to the next one. # This ain't gonna happen unless the QRZ server is # down. "if ($res->is_success)" means we got some kind # of response from the server. The QRZ server will # barf on about 1/2000 requests, hence the retries. + }
|
|---|