needperlhelp has asked for the wisdom of the Perl Monks concerning the following question:

I use a perl program to generate a report ( runs queries & some transformations ) This program will typically run for hours, The run window of this program is shared with other business critical jobs. Off late due to growing dataset, the critical jobs are slipping the sla

to mitigate this problem, i plan to put the perl report program to sleep ( SIGSTOP ) and wake it later with a SIGCONT when other critical jobs completes, i have the following question(s)

The aix has the following network parameters (no -a) tcp_keepidle = 150, tcp_keepinit = 150, tcp_keepintvl = 150, going by the above settings, if i stop the perl process, the tcp-hearbeat packets will not be sent every 150/2 seconds (tcp_keepintvl) so after 150/2 seconds (tcp_keepidle) the db connection opened will be invalidated by the o.s.

i tried this on the development machine, which has the same network settings but the connection was never broken, I’m sure to miss something
The db is sybase and runs on the same machine as the Perl program, i use dbi::simple to access the db

can anyone help me to understand the behavior, thanks

Replies are listed 'Best First'.
Re: SIGSTOP & TCP Heartbeat
by pc88mxer (Vicar) on Nov 27, 2007 at 08:10 UTC
    I am not entirely sure about this, but I don't think you generally have to worry about a suspended process having its network connections dropped due to a tcp keepalive timeout. I think the kernel will continue to service the connection (including generating and responding to keepalive packets) as long as the socket connection exists regardless of the state of the process (running, suspended, sleeping, blocked on I/O, etc.)

    In any case, this is very easy to test: just run tcpdump while your program is suspended and see if there are keepalive packets generated.

Re: SIGSTOP & TCP Heartbeat
by aquarium (Curate) on Nov 27, 2007 at 05:16 UTC
    have you profiled your perl program and db resources during a typical run? The perl script is possibly just waiting for a large update/reindex on the db to finish...just guessing...and suspending or running it at lower priority may have zero effect. profile it.
    the hardest line to type correctly is: stty erase ^H
      Every time i stop the report program, the critical job completes soon after that,
      like you mentioned the contention for data happens @ db level.
      since the report program make a large number of discrete query calls,
      stopping the perl program helps
Re: SIGSTOP & TCP Heartbeat (misnamed)
by tye (Sage) on Nov 27, 2007 at 22:01 UTC

    TCP keep-alives are one of the most misunderstood features of TCP and also a rather poorly named one. A TCP connection will remain open forever if no packets ever show up to tear it down. I often see plans for turning on TCP keep-alives and they are almost always based on flawed motivations.

    I can see turning on TCP keep-alives for an often-silent connection that goes through a firewall where the firewall has a configured time-out beyond which long-silent TCP connections are dropped from its table of open connections.

    TCP keep-alives are certainly not particularly useful in keeping a TCP connection alive (except for the case of unusual situations with firewalls, as above). To repeat, a TCP connection can remain valid forever with no packets being exchanged at all. It requires a packet to tell a TCP connection to tear down.

    Keep-alives are also usually not of much help in noticing that a TCP connection is no longer alive. For that to work you'd need to configure one side to send keep-alives and the other side to detect that keep-alives have ceased. I know the first part of that can be configured but I've not seen an option for the latter, thus keep-alives aren't useful for detecting an otherwise-silent break in a connection.

    So, it is unclear whether suspending a process would suspend the sending of keep-alives for its connections (it might, or the sending might be handled by part of the network stack separate from the process and so suspending the process might have no impact). But a stop in the flow of keep-alives is unlikely to be noticed by the other side of the connection. So I'm not surprised that you didn't see connections being torn down.

    - tye        

      thanks much for explaining whats happening behind the scenes.
      i will implement this solution in production now :-)
Re: SIGSTOP & TCP Heartbeat
by aquarium (Curate) on Nov 27, 2007 at 22:13 UTC
    there might also be a "plan B" option....the perl program to maintain a file of results which, if exists, is read in and pick up running the script from that point. this means that you can simply quit the program when needed, and start again at any later stage when it still makes sense to combine the earlier and current results from the SQL queries.
    the hardest line to type correctly is: stty erase ^H
Re: SIGSTOP & TCP Heartbeat
by chrism01 (Friar) on Nov 28, 2007 at 01:42 UTC
    I'd agree with aquarium and tye. The timeout if if occurs would usually be initiated by the DB. I've had that prob myself.
    Being able to know where you got to and be able to continue from there could be very useful, enabling you to break it into several runs if needed.
    Also deals with prog the dieing/being killed for any reason.

    Cheers
    Chris