comment on

I don't have the right combination of stuff to run your code right now, but yes, http://www.rheinfelden-schulen.ch/bezirksschule/will not be found. You aren't checking for success or failure of the page fetch. I'm guessing that the "not found" error happens when you try to use the $mech object for the page that didn't "work". So I would suggest checking to see if the "get" worked before trying to do anything more with it.

$mech->get($_);
if (!$mech->success())
{
    ... failed somehow.. do something
    print "get of $_ failed!\n";
    next;
}
[download]

PS: I find $name =~ s#^http://##i; a bit "hard on the eyes". I sometimes use the | character, s|^http://||; but some folks object to that as the | normally means "or" in a regex. I think s[^http://][]; will work also?

I don't know if these "not found" errors are transient or not. You can make use of the redo function to go back to the while() without re-evaluation (ie getting the next url) - this is like "next;" except that the while conditional is not re-evaluated. Of course you will need to structure the redo; within code some appropriate counter for max_retries so that you don't wind up in an infinite loop. But the first step would be to see if just skipping that URL like above will allow the code to complete. Then we can talk about "how to give it another chance".

BTW: It's been some months since we talked about this project. What lead you to go down the road of using Mechanize::Firefox? This adds an additional layer of complication to the whole thing - I'm for example having some version issue with Firefox and Mozrepl - so there are some "landmines" along this path.

Update:
If you add: $|=1; at the top of the code, this will un-buffer writes to STDOUT and make it easier to follow what the code is doing while it executes. If you don't do that, there is a long lag between the program printing and that output appearing on the screen because the typical buffer is ~4KB - many lines are "printed" by the program before they are "flushed" to the output. "flushing every print" has a performance impact, but in this case, it will make no difference at all.

In reply to Re: WWW::Mechanize::Firefox runs well: some attempts to make the script a bit more robust by Marshall
in thread WWW::Mechanize::Firefox runs well: some attempts to make the script a bit more robust by Perlbeginner1

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.