aaronwroblewski has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have only been with Perl for a little over a year and find it to be an extremely useful and powerful language.

I have recently been developing on a network product (client)that requires a server connection to place data into a database.

I figured perl would be a great solution to my requirement and thought I'd give IO::Socket a go.

My server would have to handle multiple simultaneous connections and keep the socket open for what could be an infinite amount of time to receive strings from the client and only close the connection when it is lost or closed by the client.

I managed to get it all running and it seems to run extremely well, UNTIL:

After running for several hours, half a day or so, and receiving and processing data. the program becomes unresponsive. The sever still accepts connections on the listening port however does not do anything, no response.

If I stop and start the script however it continues to function as usual until it hangs again.

Please help, what am i doing wrong.

Here is my script, sorry for the lack of commenting.

#!/usr/bin/perl use IO::Socket; use IO::Handle; use DBI; $sqlhost ="localhost"; $sqluser ="sqluser"; $sqlpass ="sqlpass"; $sqldb ="sqldb"; $sqltbl ="sqltable"; use Sys::Hostname; use POSIX qw(:sys_wait_h); sub REAP { 1 until (-1 == waitpid(-1, WNOHANG)); $SIG{CHLD} = \&REAP; } $SIG{CHLD} = \&REAP; while (1){ my $sock = new IO::Socket::INET ( LocalHost => 'localhost', LocalPort => '1818', Proto => 'tcp', Listen => 255, Reuse => 1, Timeout => '15',); $sock or die "no socket :$!"; STDOUT->autoflush(1); my($new_sock, $buf, $kid); while ($new_sock = $sock->accept()) { $new_sock->autoflush(1); # execute a fork, if this is # the parent, its work is done, # go straight to continue next if $kid = fork; die "fork: $!" unless defined $kid; # child now... # close the server - not needed close $sock ; while (defined($buf = <$new_sock>)) { chop $buf; foreach ($buf) { $socketport=$new_sock->sockport(); $socketaddress=$new_sock->sockaddr(); $peerip=$new_sock->peerhost(); $peerport=$new_sock->peerport(); print "\n\n\nAccepted New Client Connection From : $peerip:$pe +erport\n "; print "Socket Created On : $sockeaddress:$socketport\n\n"; print "Client Said \" $_ \"\n"; # Pull String apart and do stuff! $scount=substr($_,16,1); $msgtype=substr($_,1,2); $msgseq=substr($_,3,1); $msgreply=substr($_,1,3); $alarm=substr($_,49,1); $data="!$msgreply\OK"; $alarmreset="!DA".$msgseq."5555"; # Reply Client print $new_sock "$data\r\n"; print "I Said \"$data\"\r\n"; if($scount=~/A/){ $from=substr($_,4,6); $hours=substr($_,10,2); $mins=substr($_,12,2); $secs=substr($_,14,2); $latd=substr($_,17,2); $latm= substr($_,19,6)/10000; $late=substr($_,25,1); $longd=substr($_,26,3); $longm=substr($_,29,6)/10000; $longe=substr($_,35,1); $d=$latd; $m=$latm/60; $latituded=$d+$m; $dd=$longd; $mm=$longm/60; $longituded=$dd+$mm; if($late=~/S/){ $latituded="-$latituded"; }; if($longe=~/W/){ $longituded="-$longituded"; }; $knots=substr($_,36,3); $kph=($knots * 1.852); $mph=($knots * 1.150779); $azmth=substr($_,39,3); $day=substr($_,42,2); $month=substr($_,44,2); $year=substr($_,46,2); $epanic=($_,49,1); $eoverspeed=($_,52,1); $egeoin=($_,52,1); $egeoout=($_,52,1); $egeooutspeed=($_,52,1); $egeoinspeed=($_,52,1); $event=substr($_,48,5); if ($epanic =~ /D/){ $panic = "-1"; } else { $panic="0"; } if ($eoverspeed =~ /D/){ $overspeed = "-1"; } else { $overspeed="0"; } if ($egeoin =~ /B/){ $geoin = "-1"; } else { $geoin="0"; } if ($egeoout =~ /A/){ $geoout = "-1"; } else { $geoout="0"; } if ($egeooutspeed =~ /E/){ $geooutspeed = "-1"; } else { $geooutspeed="0"; } if ($egeoinspeed =~ /F/){ $geoinspeed = "-1"; } else { $geoinspeed="0"; } # Send Proccessed String to Database! $dbh = DBI->connect("dbi:mysql:$sqldb;host=$sqlhost", $sqluser , $sqlp +ass); $sql = "INSERT INTO `$sqldb`.`$sqltbl` ( `peerip` , `peerport` , `selfrom` , `longdeg` , `longmin` , `longcomp` , `latdeg` , `latmin` , `latcomp` , `longdec` , `latdec`, `hours` , `mins` , `secs` , `knots` , `kph` , `mph` , `azimuth` , `day` , `month` , `year` , `event`, `panic`, `overspeed`, `geoin`, `geoout`, `geooutspeed`, `geoinspeed` ) VALUES ( '$peerip' , '$peerport' , '$from' , '$longd', '$longm', '$longe', '$latd', '$latm', '$late', '$longituded', '$latituded', '$hours', '$mins', '$secs', '$knots', '$kph', '$mph', '$azmth', '$day', '$month', '$year', '$event', '$panic', '$overspeed', '$geoin', '$geoout', '$geooutspeed', '$geoinspeed' ); "; $sth = $dbh->prepare($sql); $sth->execute; $dbh->disconnect; }; $new_sock->flush; } } exit; } continue { # parent closes the client since # it is not needed close $new_sock; } close($new_sock); }

Replies are listed 'Best First'.
Re: IO::Socket Not responding after period of time and traffic!
by rowdog (Curate) on Aug 15, 2010 at 13:01 UTC

    I believe Corion is on the right track. You have

    1 until (-1 == waitpid(-1, WNOHANG));
    While perlipc uses
    while (($child = waitpid(-1,WNOHANG)) > 0) { $Kid_Status{$child} = $?; }

    The key difference being that your reaper continuously loops when waitpid returns 0, which means the a child is still running (depending on the OS). Why would that matter? If you have 2 kids running and one exits, your signal handler will waitpid for the finished child and then loop to waitpid again, since the second child is running, the return is zero so you loop again, and again, and again.

    I know that the details of signal handling vary a lot across different OSes so there may be a perfectly valid reason to be using -1, but I can't see why from here.

    Update: It's a child, not the child.

      Actually, wait returns the PID of the just reaped child, and under the Windows fork emulation, emulated forked children get negative PIDs, so a check for equality to -1 sounds saner than checking for >0.

        Well, okay, that's a good point but I'm not crazy enough to want to deal with fork and Windows (use threads;)

        My point is that most UNIX like OSes treat a return of zero from waitpid with the WNOHANG option as a special case that says "there are no stopped, continued or exited children". If you idle loop waiting for your children to exit, you won't be getting out of the signal handler often enough to handle new connections. Just testing for -1 isn't enough.

Re: IO::Socket Not responding after period of time and traffic!
by Perlbotics (Archbishop) on Aug 15, 2010 at 13:51 UTC

    Hi aaronwroblewski,

    I would approach the problem using the following steps (maybe not in exactly the same order):

    • Style: A little indentation helps to comprehend the code easier and also helps to get answers faster from the Monks trying to help you.
    • Style: add usestrict; usewarnings; / correct errors / reflect warnings
    • Error checking. Add some error checks (e.g. where DBI is used).
    • Debugging: Where does the blocking occur? Add some logging, e.g. at least where new process is forked and where the connection to the DB is established. The DB-connection might need a timeout plus error-handling.
    • Program logic: Why create/delete a server socket in the outer loop? Creating it once should be sufficient. Each incoming connection (accept()) gets a fresh process to handle the new connection. So, getting rid of while(1) and close $sock; makes code cleaner and (I am not sure here) reduces the chance of race conditions.
    • Protocol design: I would re-think a design that assumes that a connection is established/in-use for an infinite amount of time. Close connections after some idle-time. That saves resources and can help to (later) make fail-over scenarios easier (e.g. by means of a load balancer). If a permanent connection is a mandatory requirement, at least use some keep-alive messages during idle times to verify proper connection/application state.
    • OS: Seems, this script is operating in a UNIXish environment (POSIX, Sys::hostname, !#...). Does netstat or lsof reveal something unusual? Number of filehandles available to the process could be a limiting factor too (see ulimit).
    • Networking: Blocking can happen outside the script between client and server. I have seen situations where a firewall silently dropped packets of a connection that has been idle for too long. Using keep-alive messages can help if you have influence on the client side (seems this is a proprietary protocol?). Some TCP stacks can be configured for keep-alive messages too (not recommended!).
    • Re-use: This is common server programming. Chances are high, you find a suitable module on CPAN..., e.g. Net::Server.
    HTH

    Update (in response to Re^2: IO::Socket Not responding after period of time and traffic!):
    Sorry, when I meant get rid of while(1), I was possibly not precise enough. You still need an infinite loop to accept() new connections, but you do not need to create/kill the server socket itself. See the following annotated modifications to your script... I used #-- for removed lines, #++ for lines added, and #!! to indicate your shame ;-)

    HTH2

      Thanks, Ill look into some of your suggestions, I will have to read up on alot of them.

      Thanks again.

      I have looked at quite a few of the suggestions made throughout this thread and made some changes without much improvement.

      I run my script using the suggested Devel:Trace and found that the while(1) loop that you pointed out was indeed causing some issues with resource (Race Condition).

      So I did what you suggested and removed the Parent Loop and the close $sock;.

      now my problem is that after a connection is made, passed off to the child and child exits, the Parent/script exits.

      How can I make the Parent sit idle waiting for the next connection without re creating the socket?.

      Thanks,

      It is running alot more stable for the one connection now.

      However there is a small problem and I think it is because of the location of you sub REAP line as you suggest in your code, that seems to not allow any further connections while the child previous is still open.

      If i do a lsof after the first connection is made and still active, it shows two Listens and one Established.

      The code then gets stuck at REAP and accepts no further connections.

      Any how, it is late and I am getting square eyes, I will look at it further tomorrow after some more meditation..

      Thanks Again.

Re: IO::Socket Not responding after period of time and traffic!
by Corion (Patriarch) on Aug 15, 2010 at 12:22 UTC

    I won't bother to wade through your code, but you might want to investigate whether you are properly reaping your children. Your process seems to fork(), and you might run out of process or system resources. As you didn't tell us your OS, it's hard to guess, but if you're trying to use fork on Windows, I had problems with forking off processes and the whole thing would stall after some time.

      Sorry I just assume we all use Linux, but how ignorant of me.

      CentOS to be exact.

Re: IO::Socket Not responding after period of time and traffic!
by Khen1950fx (Canon) on Aug 15, 2010 at 11:55 UTC
    You want to limit the number of lines to the shortest workable number. Since you problem is with the server hanging, I reduced your script to the minimum necessary.
    #!/usr/bin/perl use strict; use warnings; use IO::Socket::INET; my $sock = IO::Socket::INET->new( Listen => 5, LocalHost => 'localhost', LocalPort => 12400, Proto => 'tcp', Timeout => 15, Blocking => 0) or die "Couldn't create socket: $!\n";
    Blocking is default unless Blocking is set to 0, which is nonblocking.
Re: IO::Socket Not responding after period of time and traffic!
by BrowserUk (Patriarch) on Aug 15, 2010 at 17:08 UTC

    Probably not the cause of your cited problem, but I'm utterly astounded that your code works at all. Given that the 6 lines in the second block below are missing the keyword substr?

    $day=substr($_,42,2); $month=substr($_,44,2); $year=substr($_,46,2); $epanic=($_,49,1); $eoverspeed=($_,52,1); $egeoin=($_,52,1); $egeoout=($_,52,1); $egeooutspeed=($_,52,1); $egeoinspeed=($_,52,1); $event=substr($_,48,5);

    I guess either errors are rare, or not handling them means ignorance is bliss. I have to award you the prize for some of the nastiest code I've seen this year.

    Had you used strict and warnings, you might have noticed that. Along with the typo in:

    print "Socket Created On : $sockeaddress:$socketport\n\n";
    and the bad escape in $data = "!$msgreply\OK";.

    With those, judicious use of unpack, and a little formatting and you could end up with 120 lines that looks like this rather than the 260 lines of yuck you posted. And the monks might be more tempted to look closely.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I broke out in laughter when I read this, I have a PostIT note here to look at why some data was not ending up in the DB (haven't looked at it yet) and you solved it for me by pointing out a simple oversight. Thanks.

      Thanks for the other hints of encoragement, it nice to know I have the award for something prestiges lol.

      How did you tidy up my code so easy and well, I am sure you did not just rewrite it?!.

      Thanks again!
        How did you tidy up my code so easy and well, I am sure you did not just rewrite it?!.

        Sorry to disappoint, but I spent about 15 minutes with my editor. A couple of macros for in/undenting, but mostly just by hand. It's a simple, almost mechanical process that I apply to most every piece of badly structured code I look at in detail. It's second nature, and helps me understand the code.

        The most complex task was converting your substrs to unpack. I marked all the lines containing "substr", copied them to another buffer, sorted them into order and did a little block c&p to create the variable list and template. That's when I discovered the missing keywords.

        For solving your described problem. The first thing I would seek to eliminate, is any bad interaction with the DB.

        I'd set a version going that instead of performing the actual DB calls, have it just write the formatted SQL to a file. That will allow you to avoid losing the data because it will be a simple process to use your DBs command line client to apply the SQL manually.

        The particular possibility I'm thinking of here is that client is running your DB server out of handles. A few years ago I encountered a similar problem with MySql where the server was configured with a relatively low number of concurrent handles, and a relatively long timeout (~900 seconds). This meant that after a period of particulary high activity, inbound connections to the DB would block awaiting a handle to timeout and become available.

        The solution--beyond upping the number available and trimming back the timeout--was to ensure that all statement handles were correctly finished; and all DB handles were correctly disconnected.

        You are doing the latter, but not the former, and it might be the cause here. The nice thing is that by writing the SQL to a file, you can quickly find out if it is without data loss.

        Anyway, if not, the tidied up code should be easier to debug. You might (for example) consider using Devel::Trace to determine where in the code it is hanging.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: IO::Socket Not responding after period of time and traffic!
by FunkyMonk (Bishop) on Aug 16, 2010 at 00:06 UTC
    sorry for the lack of commenting
    Apologise not for the lack of comments, but for

    • not using strict and warnings1
    • very poor formatting of your code
    • using chop (which may be correct and proper for you, but is a real bad code smell to me)

    My intuition tells me "your" code has been generated by another program.

    --
    1 I would expect you to know this after "a little over a year" with Perl