exec command taking ages

pmcilfatrick has asked for the wisdom of the Perl Monks concerning the following question:

We have a webpage for analysing WebLogic log files which writes the progress to the browser. This interactive processing can sometimes take several hours depending on the number and size of the log files and necessitates the PC remaining switched on.

I have now modified our webpages so now there is an option to process the log files as a background task.

Here is the code:

#!/usr/bin/perl -w


use CGI qw/:standard/;
my $q=new CGI;
use strict;

umask 000; # This will set the permission of new files to the default 
+for the user running this script.
$|=1; # Autoflush after every print statement.

print $q->header(); # i.e. this is the same as: print "Content-type: t
+ext/html\n\n";

my $default_dir = "/www/develop/jms/bridge_errors";
my $default_files_dir = "/www/develop/jms/bridge_errors/files";
my $identifier = $q->param("identifier");

#print "\$identifier = $identifier\n";
#exit;


print <<HTMLcode;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http:/
+/www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Background Processing</title>
<link rel="stylesheet" href="/jms/css/jumble.css">
</head>

<body>

<h1>JMS Bridge Error Summariser Background Task</h1>
<div class=panel>
<p>Background processing of the '$identifier' log files has started ..
+.</p>
</div>

<br>
<hr>
<br>

<address>
<a href="mailto:aaa.bbb\@xxx.yyy">Contact</a>
</address>

</body>
</html>

HTMLcode


# Launch the WLS Log Analyser script as a background task.

exec "nohup ./error_summariser.cgi $identifier > $default_files_dir/$i
+dentifier/processing-log.txt &" or die "Can't exec: $!\n";
[download]

My question is how to stop this webpage taking ages in Firefox showing the message "Transferring data from <server name>" at the bottom left corner of the browser and progress being shown in the bottom right. The webpage can take up to 5 minutes to display 'Done'!

BTW, we are running Apache 2.2.16 on OEL5.3.

Is there any way I can get this webpage to complete more quickly?

Thanks

Paul McIlfatrick

Comment on exec command taking ages Download Code

Replies are listed 'Best First'.
Re: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 12:37 UTC
You probably want to close STDOUT and STDERR, because the web server (and in turn the browser) waits for the CGI script to close them.	[reply]
Re^2: exec command taking ages by pmcilfatrick (Initiate) on Nov 24, 2010 at 14:18 UTC
You are correct! I added a close STDOUT and a close STDERR before the exec command and now the webpage displays 'Done' immediately. Thnaks for your quick reply. Paul McIlfatrick	[reply]
Re^3: exec command taking ages by pmcilfatrick (Initiate) on Nov 24, 2010 at 15:38 UTC
I was too quick with my reply basing it on how quickly the browser displayed 'Done' and I had not checked the processed files - there had been no processing of the files. Adding a close STDOUT and a close STDERR before the exec command prevented the exec command from running and so no files were processed. Putting a close STDOUT and a close STDERR after the exec command results in the same long delay before the browser displays 'Done'. Paul McIlfatrick	[reply]
Re^4: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 15:46 UTC
Re^4: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 16:06 UTC
Re^5: exec command taking ages by pmcilfatrick (Initiate) on Nov 25, 2010 at 13:16 UTC
Re: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 12:49 UTC
You will also want to untaint $identifier	[reply]
Re^2: exec command taking ages by pmcilfatrick (Initiate) on Nov 24, 2010 at 14:36 UTC
I am not sure if I need to use taint as this is an internal company webpage. Also, the $identifier variable holds the directory that the user selected from a list on an intial webpage and the variable is passed to another .cgi script before finally being passed to this .cgi script so I am not sure how taint would help. Paul McIlfatrick	[reply]
Re^3: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 14:48 UTC
Also, the $identifier variable holds the directory that the user selected from a list on an intial webpage and the variable is passed to another .cgi script before finally being passed to this .cgi script so I am not sure how taint would help. Thats what you hope $identifier holds. The program does absolutely no checking to see what is in $identifier, and then merrily passes $identifier to the shell for execution, classic security hole.	[reply]
Re: exec command taking ages by locked_user sundialsvc4 (Abbot) on Nov 24, 2010 at 15:54 UTC
To my way of thinking, a web-page should never “wait for” any long process to complete. It can be used to submit a request, and it can be used to query to see whether the request has completed, and it can be used (if it has completed) to retrieve the results therefrom. But the web-page should never be the actor in this play. The user submits a request. This request is carefully validated and then handed-off to a background job-processing system, which gives the user some kind of token to follow-up on it with. Maybe, when the job is done, an e-mail is sent. The user returns, presents the token, and gets the result. The request-processing should be completely independent of the web-page mechanism, and thus immune to the number or the frequency of web-page hits. It, too, should validate every request that it receives, and it should refuse to accept too-much work. Likewise, it should refuse to attempt to perform too-many work requests simultaneously. There are plenty of good transaction-processing frameworks out there, on CPAN and elsewhere. (For that matter, “soup-to-nuts request processing frameworks” are also available off the shelf. The costs of building such a mechanism from-scratch are not insignificant, and should be avoided if possible.)
Re^2: exec command taking ages by Anonymous Monk on Nov 24, 2010 at 15:57 UTC
All very good point. What are some of these off the shelf and on CPAN solutions called?	[reply]