mickey has asked for the wisdom of the Perl Monks concerning the following question:

I have a program design question for the monks today.

I'm working on a web-spidering application using Win32::IE::Mechanize to do the spidering and HTML::Treebuilder to extract information. I'm running into difficulty dealing with intermittent network issues -- sometimes for a few seconds I can't reach the remote site, or the site is unresponsive, or something else impedes the flow of the program.

This is, as we all know, a fact of life on the internet. But at the minute, if my program doesn't find the piece of information it's looking for on the page it thinks it loaded, it dies.

Now part of the problem is that I'm not checking for errors robustly enough. I'm about to put just such error checking in.

But another part of the problem is that I'd like to be able to catch such a failure and retry a couple of times before finally giving up. I'm having trouble coming up with an elegant architecture for doing this, and I would love some advice from some the rest of you wise monks on how to do this nicely. The ideal solution would also be abstract enough to use easily with multiple sequences of actions -- I have a couple of different programs that all face the same issue.

Thanks very much for your meditation on my difficulty.

Replies are listed 'Best First'.
Re: catching failures and retrying
by BrowserUk (Patriarch) on Mar 22, 2005 at 13:36 UTC

    Most spiders are based around an array used as a queue. You push the first (set of) urls onto the queue and then start the spider, it pulls a url off the queue, fetches it, extracts any links and pushes them onto the queue and loops.

    To support retries, instead of pushing just the url, push a url/count pair. Either as an anon array

    push @urls, [ $tries, $url ];

    Or you could concatenate them into a string if lots of 2-element arrays proves to be a memory problem.

    Preset $tries to 3 or 5 or whatever, and each time you fail, decrement the count and push it back on the queue if it hasn't reached 0 yet. When it reaches 0 give up and report the failure.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
Re: catching failures and retrying
by friedo (Prior) on Mar 22, 2005 at 13:38 UTC
    There are numerous fancy exception modules on CPAN of course, but for a very simple solution, I have used this on occasion:

    while(1) { eval { do_something_that_may_die(); }; last unless $@; print "Something blew up: $@\nRetrying..."; }
Re: catching failures and retrying
by RazorbladeBidet (Friar) on Mar 22, 2005 at 13:17 UTC
    I find the Error module to be a good OOP way of throwing/catching/retrying exceptions.

    You can retry by calling a function from your catch block - there may be some redundant code, but there are workarounds for that.
    --------------
    It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs

      Thanks very much. Error looks useful for error handling... I think that's a secondary concern for me, though.

      The big issue, I think (I'm still working on understanding what the crux of the problem is myself), is how to go back to a previous step.

      For instance, if my program proceeds like this:

      ## Step 1 go_to_web_page_A(); ## Step 2 look_for_stuff_on_web_page_A(); ## Step 3 submit_form_on_A_to_go_to_B();

      and it dies on step 2, that's almost always because step 1 failed. That is, it can't find the stuff it's looking for because the web page failed to load correctly.

      So what I want to do is catch the failure of look_for_stuff_on_web_page_A() and, in case of a failure, retry go_to_web_page_A(); and then look_for_stuff_on_web_page_A(); a couple times before finally giving up.

      Basically what it's doing is "Do A; Do B; If B fails, go back and do A again and then retry B;", and the part I'm having trouble with is the "If B fails, go back to A" part.

      Any thoughts?

        Error provides you a generic and extensible framework to handle errors, but it won't do exactly what you're looking for.

        You basically need a method of retrying, which can be handled as BrowserUk states, or, based upon your code - can be something like:
        # CODE NOT TESTED # for display purposes only # void in Utah my $rc = undef; my $i = 0; do { $rc = go_to_web_page(); } while ( !$rc && $i++ < 3 );
        or, perhaps, using Error
        # CODE NOT TESTED # for display purposes only # void in Utah my $i = 0; GET_PAGE: while ( $i < 3 ) { try { go_to_web_page(); # the above has to throw some kind of error # for this to work last GET_PAGE; } catch Error with { $i++; }; }
        I'm sure there's a cleaner way to do that, but it's just an example off the top of my head (proof of concept).
        --------------
        It's sad that a family can be torn apart by such a such a simple thing as a pack of wild dogs
Re: catching failures and retrying
by radiantmatrix (Parson) on Mar 22, 2005 at 17:21 UTC

    Create a wrapper function (we'll call it wrapper) that returns true on success and false on failure. It might look like this, then:

    sub wrapper { eval { ## do some stuff that might die }; if ($@) { ## handle some error things, die if unrecoverable return 0; # returns false if recoverable } return 1; #returns true if we should retry }

    To retry a few times before giving up, then, you may write:

    use constant RETRIES => 3; for (1..RETRIES) { last if wrapper() }

    In this way, you will try up to three times before the loop will exit. If you need more complexity, you might expand to something like:

    use constant RETRIES => 3; our $give_up = 0; for (1..RETRIES) { $give_up = 0; last if wrapper(); $give_up = 1; } die ('Tried '.RETRIES.' times without success. Giving up.') if $give_up;

    This will set $give_up on each try in such a way that a sucessful run will result in $give_up == 0, while a failed run will result in $give_up == 1. We can then check that value to see if we gave up or no.

    radiantmatrix
    require General::Disclaimer;
    s//2fde04abe76c036c9074586c1/; while(m/(.)/g){print substr(' ,JPacehklnorstu',hex($1),1)}

Re: catching failures and retrying
by cbrandtbuffalo (Deacon) on Mar 22, 2005 at 17:53 UTC
    Maybe this is obvious, but to implement some of the suggestions, you need to determine what 'failure' means. With regular Mechanize, you can check 'status' to see what code you got back. Anything other than 200 is an error of some sort, and you can set up various handlers to react based on different error codes (402, 404, 500, etc.). If it just times out, you can detect that value also.

    Trouble is, I'm not sure if the Win32 version sets status the same way the base Mech does. If not, you may need to do some more work to evaluate the results of a 'get'.