LiTinOveWeedle has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I wrote this script to validated if our DEC net servers and routers are alive. Its writed to be capable to ping several hundred instances at time about 1 minute. So it have to use forking. I am using Linux patched kernel with Decnet support and binary dnping to ping decnet node. Unfortunately this command hasn't timeout so if node is unreachable procces won't never end. Due to this problem I created hash %proc and store in it for each proccess information about time when it was started. When this time will exceed timeout, procces is killed. Also if procces end by normal way, sub REAPER will wait() and also remove record for this proccess from %proc. The rest of script is very easy and simple:

Script read text file (one decnet address per line) initialize and clear File::Cache which is used to store parsed dnping output. Then script enter neverending loop. In each loop script have to fork for as many child proccesses as possible. Child proccesses are pinging and storing output values in File::Cache. Parent take care about timeout function. There is also small sub which guess number of proccesses which are needed to finish all pings at time. And this sub can increase or decrease number of procceses to fork.

Problem are that sleep() at line 107 don't work... Script don't sleep here. And I don't know why :-( and when this script is started it do few loops and then is terminated with Segmantation Fault error. Please can anybody help me to tell me where is this ugly bug?

Many thanks Litin

#!/usr/bin/perl ################################################# # # # DnPing Pro 1.00 # # Copyright 2001 by Dominik Strnad # # litinoveweedle@cmail.cz # # http://www.cgi-wizards.com # # # # Last modified on February. 6, 2002 # # # ################################################# # # Copyright Notice: # Copyright 2001 Dominik Strnad, Czech Republic. All Rights Reserved. # This product is offered under GNU license # Check GNU.txt or http://www.gnu.org/licenses/licenses.html#GPL for d +etails # # # # #use strict; use File::Cache; use vars qw( $input_path $log_path $logging $file_lock $timeout $comma +nd $parameter $max_proc $min_proc $def_proc $safety $act_proc @names @address $run_time $start_time $ +avg_time $i %proc $pid $sleep $finish_time ); # -------------------------------------------------------------------- +----- # Config part here $input_path = "/home/bilbo/address.txt"; #path to addr +ess file, one address per line $log_path = "/home/bilbo/log.txt"; #path to log +file $logging = 1; #logging enab +led (1) or disabled (0) $file_lock = 0; #locking log +and input files - not supported on Win9X $timeout = 5; #timeout for +each child proccess $command = "dnping"; $parameter = "-t"; $max_proc = 20; # Maximum num +ber of procceses which can't be exceeded $min_proc = 5; # Minimum num +ber of procceses $def_proc = 10; # Start up nu +mber of procceses $safety = 0.25; # Safety coef +icient for calculating number of procceses $run_time = 60; # Time in sec + to finish all pings @names = ( "dec", "ava", "min", "avg", "max" ); # -------------------------------------------------------------------- +----- # Program body under this line. Do NOTHING change here if you don't un +derstand what you are doing $|++; sub REAPER { my ( $proc, @temp ); $proc = wait; @temp = @{$proc{$proc}}; #if proccess is death r +emove null its procces table value if ( defined $temp[0] ) { $proc{$proc} = [ 0, 0 ]; &count_proc( time - $temp[0] ); } $SIG{CHLD} = \&REAPER; # loathe sys +V: it makes us not only reinstate the handler, but place it after the + wait } $SIG{CHLD} = \&REAPER; # now do som +ething that forks... # -------------------------------------------------------------------- +----- # Main loop body @address = &read_file( $input_path, $file_lock ); &logging( "Address file read" ); &initialize_cache; &logging( "Cache file structure initialized and cleared" ); while ( 1 ) { #do forever undef %proc; $i = 0; #set addres +s index to null $start_time = time; #write star +t time &logging( "Start Time is $start_time") ; while($i < @address) { #is there a +ddress to ping? if ( values(%proc) < $def_proc ) { #is maximum + procceses running now? $pid = fork(); #if not for +k if ( not defined $pid ) { die "cannot fork"; } #if fork wa +sn't succesfull warn and die; elsif ( $pid == 0 ) { #I am child &write_output( &do_measure( $address[$i], $i ) ); #Do pin +g and write output exit; #exit child + procces } else { #I am paren +t $proc{$pid} = [ time, $i ]; #create rec +ord for procces $i++; #next addre +ss &kill_timeout; #terminate +any procces running longer then timeout } } else { do { #do &kill_timeout; #terminate +any procces running longer then timeout } while ( values(%proc) >= $def_proc ); #while ther +e is less processes running then maximum } } #all addres +s have to be pinged now while ( values(%proc) ) { #is there s +til procceses running? &kill_timeout; #if so kill + them } $finish_time = time - $start_time; &logging("Finished in: $finish_time sec, Average time of 1 procces: +$avg_time, Last number of procceses: $def_proc"); $sleep = $run_time - $finish_time; if ( $sleep > 0 ) { &logging(" Sleeping for $sleep, proc " . values(%proc)); + #Did I finihed in time? sleep $sleep; #Sleep for +rest of the time } } # -------------------------------------------------------------------- +----- # Check if there is proccess running longer then timeout and if so, ki +ll them sub kill_timeout { my ( $proc, @temp ); foreach $proc ( keys %proc ) { @temp = @{$proc{$proc}}; if ( $temp[0] == 0 ) { delete( $proc{$proc} ); } elsif ( time - $temp[0] > $timeout ) { &logging( "Killed $temp[1]: ping to $address[$temp[1]] exceeded +timeout" ); &count_proc( time - $temp[0] ); &write_output( $temp[1], $address[$temp[1]], 0, 0, 0, 0 ); kill 15, $proc; delete( $proc{$proc} ); } } } # -------------------------------------------------------------------- +----- # Do command and parse its output sub do_measure { my ( $address, $index ) = @_; my ( @list ); my ( @temp ) = ( $index, $address, 0, 0, 0, 0 ); $command .= " $address $parameter"; @list = qx/$command/; if ( $list[0] =~ /^Sent (\d+) packets, Received (\d+) packets$/ ) { $temp[2] = int(100 * ($2/$1)); if ( $list[1] =~ /\= ([^\/]+)\/([^\/]+)\/([^\s]+) ms$/ ) { @temp[3..5] = ( $1, $2, $3 ); } else { @temp[3..5] = ( 0, 0, 0 ); } } else { @temp[2..5] = ( 0, 0, 0, 0 ); } return @temp; } # -------------------------------------------------------------------- +----- # Initialize File::Cache and delete old values sub initialize_cache { my ( $name ); foreach $name ( @names ) { $$name = new File::Cache( { namespace => $name, expires_in => 3600, filemode => 0600 } ); $$name->clear(); } } # -------------------------------------------------------------------- +----- # write values into File::Cache sub write_output { my ( @temp ) = @_; my ( $i, $name ); $i = 1; foreach $name ( @names ) { $$name->set($temp[0], $temp[$i]); $i++; } } # -------------------------------------------------------------------- +----- # read data from file to array sub read_file { my ( $path, $flock ) = @_; my ( $line, @file ); if ( not open(FILE, $path) ) { &logging( "Can't open CSV database file. Please check that you hav +e provided the correct path to your CSV database file in config file. +" ); die; } if ( $flock ) { flock(FILE, 1); } @file = (); while ( $line = <FILE> ) { chomp($line); push(@file, $line); } close(FILE); return @file; } # -------------------------------------------------------------------- +----- # logging sub logging { my ( $time, $i ); if ( $logging ) { if ( not open(LOG, ">>".$log_path) ) { die "Can't open log file"; } $time = localtime(time); if ( $file_lock ) { flock(LOG, 2); } print LOG "$time \t @_ \r\n"; close(LOG); } } # -------------------------------------------------------------------- +----- # Count how many procces have to run to finish all pings at time sub count_proc { my ( $temp ); my ( $time ) = @_; $avg_time = sprintf( "%0.3f", ( ($avg_time * $i) + $time ) / ( $i + +1 ) ); $temp = ( $run_time - ( time - $start_time ) ); if ( $temp > 0 ) { $temp = int( ( ( scalar( @address ) - $i ) * $avg_time * $safety ) + / $temp ); if ( $temp >= $max_proc ) { $def_proc = $max_proc; } elsif ( $temp <= $min_proc ) { $def_proc = $min_proc; } else { $def_proc = $temp; } } }
Li Tin O've Weedle
mad Tsort's philosopher

Replies are listed 'Best First'.
Re: Segmentation Fault
by virtualsue (Vicar) on Mar 07, 2002 at 09:40 UTC
    I don't know why you are getting segfaults when you run your code (and right now I lack the necessary tuits to download it and find out). Your description of the problem is good though and I can make a suggestion or two based on that.

    The fact that the ping program you need to use doesn't implement timeout is making your program very complicated, because you have to clean up hung child processes. Signals can help you here (though they may be causing you other grief, see next para), try setting the alarm clock to wake them up again. Then kill them. It's morbid, but what can you do?

    I don't know why your sleep call doesn't seem to be working, but you can't use it with alarm.

      THX to virtualsue for this sugestion. I used alarm to simplified my code a lot. But problem is, that just simple using this:

      eval { local $SIG{ALRM} = sub { die "alarm clock restart" }; alarm 10; qx/$string/; alarm 0; }; if ($@ and $@ !~ /alarm clock restart/) { die }

      will genearate zombies as described in docs:

      If the operation being timed out is system() or qx(), this technique is liable to generate zombies. If this matters to you, you'll need to do your own fork() and exec(), and kill the errant child process.

      Because I need to capture output from command $string I can't use exec() command instead of backticks qx//. So this seems to be litle bit problematic.

      Anybody know? THX

      Li Tin O've Weedle
      mad Tsort's philosopher

        Consider writing a bourne shell script or a small C program that you run from backticks/qx/system in your main Perl program. Inside this subprogram, set the alarm and run your ping. I think that will put the signal handling where it belongs.