Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

My CGI program needs about an hour, sometimes more, to process the data received from the user.

Could someone please help me with the following:

There's a problem with the sendmail program I am using, so is there another way I can send the results back to the user?

For example, after the connection to the browser is broken, is there a way to reconnect and send the result to the webpage? Or are there any other ways? Some processing could take up to 10 hours, just because there's a lot of data to be processed.

Thanks for any ideas.
RB

  • Comment on alternatives to emailing the data back to the user

Replies are listed 'Best First'.
Re: alternatives to emailing the data back to the user
by tachyon (Chancellor) on Jun 20, 2004 at 11:04 UTC

    Email is a logical solution, probably sending a link to the results page. Net::SMTP, Mail::Mailer, Mail::Sendmail are all alternatives to sendmail if you want perl. qmail and exim are some of many options on the system side.

    The connection to the Browser will be broken within minutes - and let's be realistic - even if you have a results refresh page no one is going to wait 10 hours!. merlyn has a column in his Web Techniques column on handling long running CGI using refresh but 10 hours will try the patience of any user!!!!! See Watching long processes through HTTP for this technique.

    Once a connection is terminated that is it (without a forced refresh to some page). One option might be to redirect to an html page that you dynamically create for the results. Initially you just put a bookmark me link on it and a message to click the link and bookmark the page, results will arrive whenever. Simply overwrite the page with the results when they arrive and follow up with an email with the link saying your results are ready or similar. That way the user can check the bookmarked link periodically or click the email link to get to the results.

    I am vaguely interested in what takes 10 hours to run. I can pull a million URLs in 10 hours, process 20+ million DB queries, merge a few hundred million records..... It is possible there may be better algorithms for whatever the processing task is. Might be worth a second question.

    cheers

    tachyon

Re: alternatives to emailing the data back to the user
by davido (Cardinal) on Jun 20, 2004 at 16:35 UTC

    MIME::Lite allows you to specify alternate mail transports to sendmail, and will even work on systems like stock Win32 systems that don't have any built-in mail transport.

    Another alternative is to FTP the data somewhere. If your site's user has an open FTP space, you could use Net::FTP to send the data at a later time. Or, you could make it available for him to pick up within your FTP area, although that solution seems less ideal than simply putting it in your webspace and providing the user a link to use to retrieve it once it's done.

    I wonder if it would be possible for you to preprocess at least some of what the user will be after, so that processing time on demand doesn't reach ten hours. If some portions of what the user will need are predictable, you could run a cron-job to keep those portions preprocessed, and then pull the pieces together on demand in quicker time. I don't know if this is feasable, but maybe...

    Another possibility is that if your processing time is constrained by the fact that you're pulling data from other websites (a time-consuming process because connections are not instantaneous), maybe you could fork multiple connections simultaneously to speed up the data-retrieval process. Take care not to bog yourself down though; if you have dozens or more CGI processes each forking dozens or more times, you'll get into trouble quickly.

    It's wrong for us to assume that anything can be done to speed up processing time, but it doesn't hurt for us to offer suggestions just in case something can actually be done. ;)


    Dave

Re: alternatives to emailing the data back to the user
by traveler (Parson) on Jun 20, 2004 at 16:24 UTC
    You could present them with a link to the results. For instance "Your results will be available at http://www.mysite.com/results/434456fdtrevz867oukrytiysir68hhv (i.e. some random-looking name that is related to their request in a table/database of some sort) Please check that page after xx minutes". Then populate that page with a "yy% complete" message until the final results are available. The results could, of course, be computed and stored in a database and/or the results page could be a perl script that says some form of "not yet" until the appropriate process is complete.

    HTH, --traveler

Re: alternatives to emailing the data back to the user
by erniep (Sexton) on Jun 20, 2004 at 12:38 UTC
    If you could give some details on what you are attempting to process that takes 10 hours that would be helpful. You can process billions of instructions in that time frame so you must have a loop that is unending or some other type bug. What is the amount of data you are processing. What is the file structure you are using (text, db, ?). Hard to diagnose your problem without more information. thanks