rsharpe has asked for the wisdom of the Perl Monks concerning the following question:

I'm having problems with an application that I wrote, I guess about a year ago. It was been working fine and has really come in handy from time to time. Now I'm having difficulties with it. First a little background is in order.

It is a web form that posts the information to its self. It then passes several parameters to a few perl modules that I wrote. These perl modules I wrote are for dynamically generating reports from NetFlow data (TCP/IP header statistics exported from Cisco or Juniper routers). My modules interact with open source software called FlowTools.

What happens after all the data is collected and checked from the web form it then creates a filter file; this is required by FlowTools, using the first perl module. Then it creates a report format file also required by FlowTools, using the second perl module. Then I execute a FlowTools command with system() function. This is were my problem comes in. This execution take a along time, in some cases it must process over one gig of data at a time. While this execution is going on the web browser is sitting a wait for something to be returned. The web browser times out and does not finish.

What is supposed to happen next is the application is to create an HTML file and email to the recipient. Then clean up all the other files that it created in order to generate the HTML file.

So my question is how can I restructure my application so that the web browser doesn’t time out kill the child processes? Once the submit button is pressed on the web page nothing more needs to be done, the forms job is over.

I really do appreciate your advice; this one has been stumping me for a while.

Replies are listed 'Best First'.
Re: Program Structure
by merlyn (Sage) on Feb 02, 2006 at 21:06 UTC
      Thanks, that article has helped!
Re: Program Structure
by graff (Chancellor) on Feb 03, 2006 at 02:40 UTC
    It sounds like you're really close to a solution already...
    ... I execute a FlowTools command with system() function...

    What is supposed to happen next is the application is to create an HTML file and email to the recipient...

    ... Once the submit button is pressed on the web page nothing more needs to be done, the forms job is over.

    Instead of using system() to run the FlowTools command, write a simple wrapper script (separate from your CGI script), which will run FlowTools, build the HTML file of results and email that to the recipient. Then have your CGI script call that wrapper like this:

    ... my $pid = fork; if ( !defined $pid ) { # maybe report an error to the browser... } elsif ( $pid == 0 ) { exec( 'wrapper_script', 'recipient@email', @other_args ); } else { # print a simple HTML page to the client, saying that the job is # running and results will be emailed as soon as the job finishes } __END__

    The point is that once the child starts, it'll handle everything that depends on the long-running process; meanwhile the parent CGI simply tells the client browser "okay, the job is running, you'll get email" and goes away.

      This is what I have decided to do, but I'll have to do it higher up application, because the system() function that I used is burried into a perl module. And I still need to do a little bit of manipulation after that command has finished, so I must wait until that command returns.

      By moving the process forking higher up into the application I should be able to let the parent tell the browser it can go away while having the child parse my data.

      Thanks for your help

Re: Program Structure
by dragonchild (Archbishop) on Feb 02, 2006 at 21:16 UTC
    While merlyn addressed your immediate issue, I would suggest restructuring your design so that you're using something like a relational database vs. regenerating this data every time.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      The problem with NetFlow data is that the reports have to be regenerated. Generally the report is done on a days worth of statistics. So if I want to create a report for Tuesday, I can't use any data that has to do with Monday.

      If I understand the way the data is store it is already in a database type of file, they just break them down into five minute files so there is more granularity.

Re: Program Structure
by diotalevi (Canon) on Feb 02, 2006 at 21:12 UTC

    Your question is missing some paragraph markers.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊