jewang has asked for the wisdom of the Perl Monks concerning the following question:

Hi: I am new to Perl and CGI. What I am trying to do is to use CGI to create a form that collects several report parameters(date, region, store,and department) and pass those to a perl script, which runs a SQL to produce a report, the output will be saved as a tab delimited file.
I am using database link to query two database, and doing lots of calculation in the query to get the sales information.
My SQL processes large amout of data and takes more than 10 minutes to run; since the null value and different digit for same field in the different tables, I have to use case, substr, nvl function in the query. These all require lots of time.
In order to save time,i choose not to display results on the web form, just write them directly to tab delimited file.
And I use a cgi script to create the html form, collect and pass parameters. another perl script does the SQL and write the output to a file, and send an email which has a link to where the results is stored. Using CGI to call Perl, hope the process will be run in the backgroud to get round the timeout issue.
I tested it in the command prompt and on the web, it can handle a couple of thousands records. But most of the time it just time out and send no email.
I am wondering if anyone encountered the timeout problem before and how to deal with it?
Anything can be changed on the server configuration side? Or just CGI perl is not the solution for process large amount of data?
I am new college graduate, and this is my first experience with perl. Any advice will be appreciated.
Thanks in advance.
jewang

Replies are listed 'Best First'.
Re: Perl CGI timeout
by StrebenMönch (Beadle) on Dec 02, 2004 at 21:07 UTC
    I am not sure if you are set on the process you have right now (running the program in the background and then emailing it to the user) or not. But I have done some work with online reports hitting a MySQL database (not a ton, I am still a newb, so all of you more experienced folks please correct me if you feel I am wrong), and have found that a couple thousand records should not be a major problem to run on the fly and return the results to the screen (or excel, pdf, ... file). I have found that the longest part is usually rendering the information on the users screen.

    A couple suggestions that may or may not help.
    1. Make the SQL statement as fast a possible (you are probably thinking no S#$@). But what I mean is, perl can do a lot of the things you may be trying to do with SQL (like substr, decode(in some cases), uppercase, etc).
        a) Also it may be fast to actualy break you statement into multiple statements. So, instead of joining 10 tables do multiple selects...
    2. Take a look at Excel::Template(which does not appear to be on cpan at the moment) (not sure if this is an ok alternative to your tab delimited files or not...)

    We are currently set up with CGI::Application HTML::Template PDF::Template and Excel::Template for our reports (with some other stuff mixed in)

    OK, that is all for now. I am not sure if this is even close to an option for you. But it may help at least start some ideas as an alternate direction to go.
    S t r e b e n M ö n c h
      StrebenMonch: Thanks for the ideas.
      Optimized the SQL is what i have been thinking about.
      I attached the SQL here, I could only see to eliminate the calculation of the cost where I used 'case' to check null value and use sum function again. Please correct me if you think any other ones could be done in the Perl.

      After bind_columns, then use loop and array to do the summation?
      select
      mvmnt_data.region
      ,store_name
      ,nvl(i_a.nat_upc,i_b.nat_upc)
      ,nvl(i_a.long_description,i_b.long_description)
      ,nvl(i_a.item_size,i_b.item_size)
      ,nvl(i_a.item_uom,i_b.item_uom)
      ,vsi.vend_item_num
      ,salesUnits
      ,salesDollars
      ,case_size
      ,cost
      ,ps.ps_prod_subteam
      from
      (select wm.upc,sum(wm.mvmnt_units) salesUnits
      ,sum(wm.mvmnt_dollars) salesDollars
      ,wm.store
      ,wm.pos_dept
      ,wm.case_size
      ,sum(round (case when wm.case_size is null or
      wm.case_cost=0
      then null
      else (wm.case_cost/wm.case_size) *
      wm.mvmnt_units end, +3)) cost
      ,s.store_name
      ,s.region
      from stella.mvmnt_wkly wm,vim.store@vim s
      where wm.mvmnt_date between to_date
      ('10/01/2004', 'mm/dd/yyyy')and to_date
      ('10/03/2004', '/mm/dd/yyyy')
      and wm.store in ('10036')
      and wm.store=s.ps_bu
      --and wm.upc='000000015232'
      and wm.pos_dept='0110'
      group by wm.case_size
      ,s.region
      ,wm.pos_dept,
      wm.upc
      ,wm.store
      ,s.store_name) mvmnt_data,
      vim.item_region@vim ir,
      vim.item_master@vim i_a,
      vim.item_master@vim i_b,
      vim.reg_pos_dept_ref@vim rpdr
      ,vim.ps_team_ref@vim ps
      ,vim.vendor_store_item@vim vsi
      where mvmnt_data.region=ir.region(+)
      and mvmnt_data.upc=substr(ir.upc(+),-12,12)
      and substr(ir.nat_upc, 12, 12)=substr(i_a.nat_upc(+),-
      12,12)
      and mvmnt_data.upc=substr(i_b.nat_upc(+),-12,12)
      and mvmnt_data.region=rpdr.region(+)
      and mvmnt_data.pos_dept=trim(leading 0 from rpdr.pos_dept
      (+)) and rpdr.pos_dept=ps.pos_dept
      and rpdr.region=ps.region
      and mvmnt_data.store=ps.ps_bu
      and mvmnt_data.upc=substr(vsi.upc,-12,12)
      and vsi.region=ps.region
      and vsi.ps_bu=mvmnt_data.store
      and vsi.v_auth_status='P'
      Jewang
        This is where I might try to break this into 2 selects. (to the more experienced SQL people, feel free to correct me)

        Query 1
        ===========================================================
        SELECT ir.region 
              ,ir.upc
              ,i_b.nat_upc
              ,rpdr.region
              ,rpdr.pos_dept
              ,ps.pos_dept
              ,ps.ps_bu
              ,vsi.upc
              ,vsi.ps_bu
        
              ,store_name
              ,NVL(i_a.nat_upc,i_b.nat_upc)
              ,NVL(i_a.long_description,i_b.long_description)
              ,NVL(i_a.item_size,i_b.item_size)
              ,NVL(i_a.item_uom,i_b.item_uom)
              ,vsi.vend_item_num
              ,ps.ps_prod_subteam
          FROM vim.item_region@vim       ir
              ,vim.item_master@vim       i_a
              ,vim.item_master@vim       i_b
              ,vim.reg_pos_dept_ref@vim  rpdr
              ,vim.ps_team_ref@vim       ps
              ,vim.vendor_store_item@vim vsi
         WHERE substr(ir.nat_upc, 12, 12)=substr(i_a.nat_upc(+),-12,12)
           AND rpdr.pos_dept     = ps.pos_dept
           AND rpdr.region       = ps.region
           AND vsi.region        = ps.region
           AND vsi.v_auth_status = 'P'
        
        ===========================================================
        The first 9 columns in the select are new and will be used in the second select.

        Query 2
        ===========================================================
        SELECT mvmnt_data.region
              ,wm.upc
              ,sum(wm.mvmnt_units)    AS salesUnits
              ,sum(wm.mvmnt_dollars) AS salesDollars
              ,wm.store
              ,wm.pos_dept
              ,wm.case_size
              ,SUM(ROUND (CASE WHEN wm.case_size IS NULL OR wm.case_cost=0 THEN NULL 
                               ELSE (wm.case_cost/wm.case_size) * wm.mvmnt_units end, +3)) AS cost
              ,s.store_name
          FROM stella.mvmnt_wkly wm
              ,vim.store@vim s
         WHERE mvmnt_data.region=ir.region(+)
           AND mvmnt_data.upc      = ?
           AND mvmnt_data.upc      = ?
           AND mvmnt_data.region   = ?
           AND mvmnt_data.pos_dept = ?
           AND mvmnt_data.store    = ?
           AND mvmnt_data.upc      = ?
           AND mvmnt_data.store    = ?
           AND wm.mvmnt_date between to_date('10/01/2004', 'mm/dd/yyyy') and to_date('10/03/2004', '/mm/dd/yyyy')
           AND wm.store           in ('10036')
           AND wm.store            = s.ps_bu
           AND wm.pos_dept         = '0110'
         GROUP BY wm.case_size
                 ,s.region
                 ,wm.pos_dept
                 ,wm.upc
                 ,wm.store 
                 ,s.store_name
        
        ===========================================================
        So what I would do is call the first query then while looping through it you can do the proper substr on the data. And also call query 2.

        I would try to optimize query 2 as much as possable by creating indexs if need be and looking at an execution plan. For example this is what I use for MySQL

        deleteplan.sql
        ===========================================================
        delete from plan_table
         where statement_id = 'TEST'
        
        /
        commit
        /
        
        ===========================================================

        test.sql
        ===========================================================
        @deleteplan
        
        EXPLAIN PLAN set statement_id = 'TEST' for
        SQL STATEMENT GOES HERE
        
        /
        
        @getplan
        
        ===========================================================

        getplan.sql (there is some other information you can get from the plan_table, but this is all I need most of the time)
        ===========================================================
        select operation, options, object_name, cost, cardinality, TO_CHAR(timestamp, 'HH:MM:SS')
          from plan_table
         where statement_id = 'IFXTEST'
        start with id = 0
        connect by prior id=parent_id and prior statement_id = statement_id
        
        /
        
        ===========================================================
        So I just put my SQL in test.sql and run it to get my execution plan. From there you can see where your problems are.

        S t r e b e n M ö n c h
Re: Perl CGI timeout
by fglock (Vicar) on Dec 02, 2004 at 20:16 UTC

    Make the on-line script just log the request and return immediately, with a message like "your query is being processed - you will receive the result in your mail box."

    Have another process make the database/mail processing, possibly using a queue.

      fglock: Thanks for the response. But what do you mean "log the request" and "using a queue"?
      This is how I get the request in the cgi script:
      my $beginDate = $query->param('beginDate') ? $query->param('beginDate') : "";
      my $endDate = $query->param('endDate') ? $query->param('endDate') : ""
      ; my $region = $query->param('region') ? $query->param('region') : "";
      my $frmdept = $query->param('dept') ? $query->param('dept') : "";
      my $frmstorelist=$query->param('storelist') ? $query->param('storelist') :"";
      Then I pass them to "cost_movement_query.pl":
      my $command="/inetpub/dev/wangj/Vim_Stella/cost_movement_query.pl $region $storeList $frmdept $beginDate $endDate"; exec("$command");
      In "Cost_movement_query.pl", I run the SQL and send the email.
      How can I use the queue
      thanks.
      Jewang

        "log the request" - write the query to a file.

        "using a queue" - read the queries back and process them, one at a time. You can even use "cron" to coordinate the calls to "Cost_movement_query.pl".

Re: Perl CGI timeout
by zentara (Cardinal) on Dec 03, 2004 at 11:40 UTC
    There are alot of issues you have to deal with in your scenario, but here is one tip which you may find useful in dealing with long-cgi-processes.
    #!/usr/bin/perl use warnings; use strict; #Benjamin Goldberg #The Apache process does NOT wait for the CGI process to exit, it wait +s #for the pipes to be closed. (Note that if the CGI is a NPH script, t +hen #it does waitpid() for the script to exit). #Thus, you do not really need a child process -- you can have your CGI + #process transmute itself into whatever you would have the child proce +ss #do, by doing something like: $| = 1; # need either this or to explicitly flush stdout, etc. + print "Content-type: text/plain\n\n"; print "Going to start the fork now\n"; open( STDIN, "</dev/null"); open( STDOUT, ">>/dev/null"); open( STDERR, ">>/path/to/logfile"); fork and exit; exec "program that would be the child process"; #Apache waits for all children to close the pipes to apache. It #does not wait for them to actually exit -- think ... how in the world + #could it *possibly* know that those processes were really spawned fro +m #the CGI process? Answer: It can't. It can only know that *somethin +g* #still has an open filedescriptor which points to one of the pipes tha +t #apache created to capture the CGI's stdout and stderr. As long as on +e #of these pipes is open, then apache waits.

    I'm not really a human, but I play one on earth. flash japh
      Hi,
      I just checked with the folks at my company,we are running IIS 6.0 (Microsoft Internet Information Services) and Windows boxes don't seem to have a "/dev/null". :-)
      thanks. you guys are all great.
      Jewang