ChrisJ-UK has asked for the wisdom of the Perl Monks concerning the following question:

I am running a very long perl script under Apache 2.0.48 with Suse.

The script crawls web pages and analyses them. In order to avoid overloading the server/s these pages come from it has been agreed that there will be a short pause between each page request. This makes the script run for even longer.

The first server my employers put me on stopped the script after a couple of hours as a precaution against 'run away' scripts, for example; those in infinite loops.

This is, of course, a sensible precaution but as I could not over-ride this I was moved to a new, dedicated, server where we could, supposedly, configure things to avoid this timing problem.

The new server is now ending the script after approximately five minutes!

Is there a way to avoid this problem in a similar manner to how PHP allows you to set runtime variables within a script or must I wait until the server admin. guy has had a look?

Thanks for any light you can shed on this.

Chris.

Replies are listed 'Best First'.
Re: Script Timeout Settings?
by skx (Parson) on Jul 23, 2004 at 11:09 UTC

    Reading your question I wonder why the script you are running, if it's just doing crawling of webpages etc, is a CGI script at all?

    Would rewriting it as a standalone script running continously be an option?

    Failing that you could cause the script to fork into the backgroud after outputing some information to the calling window.

    Steve
    ---
    steve.org.uk

      Steve,

      I'm not entirely sure myself why I have a CGI script. It just kind of started that way.

      I suppose I need to look at setting up a standalone script but remote control is quite new to me.

      From my limited understanding of this I gather I would need Telnet/SSH.

      I'll look into this and get back to you.

      Thanks for replying so quickly.

      Chris.

        In addition to the suggestion to use LWP, you should pay attention to robots.txt since you're writing a robot. LWP::RobotUA makes this easy to do.

        It sounds like you want to use LWP to grab these webpages ( assuming they are viewable via a browser) and run your analysis routines that way. Otherwise you would need to get a copy of the file you want to analyze, or read it off the server somehow, and analyze it that way.Net::FTP comes to mind, but I would say its poor practice to have your webpages setup in an FTP folder (just personal opinion there).

        Anyhow, sounds like a super fun project and I hope you find your answers.