Restarting script without losing handles

athomason has asked for the wisdom of the Perl Monks concerning the following question:

Greetings folks,

I'm working on a TCP proxy server that watches text traffic incoming on two sockets and occassionally injects its own data, based on regex matches on the stream data. The interesting part is that the rules and actions aren't predefined: as the program is monitoring the sockets it needs to take input from somewhere (mabye STDIN) concerning 1) the regexen that determine action and 2) the actions themselves (probably as anonymous subs). The connections can't be dropped (so the script can't just be killed and reloaded), though a lag of a few seconds after adding a rule is acceptable (see further down why that might be useful).

Input from STDIN is implicitly trusted, so I don't feel wholly unjustified eval'ing what I read there (though it would require care, for sure), but I also need the stuff that's added to be persistent (i.e., last bewteen proxy sessions). The simplest thing that comes to mind is Data::Dumper'ing the ruleset to a file (or DATA, even) and eval'ing that when the script starts up. New rules added during runtime would be added to the ruleset and written out to disk for later recovery. There are a number of hacks required to go that route, though. For one, I anticipate needing global state variables that are set and used in the rules; while I could store a mini-symbol table as %state or some such, it sure seems that would be adding unnecessary clutter.

Ideally, I could put all the matching logic in the script itself instead, and somehow recompile and restart it without dropping the connection. Though there would be a delay as it recompiled, I could live with it. This strategy would also avoid the thoroughly unpleasant business of eval'ing in a network server. I had an idea that exec would be useful for this purpose, since I seem to recall filehandles are propogated to the transferee (like fork does). However, I wouldn't swear to it, and I don't see a mention of that in the perlfunc. Besides, if I do exec myself, how do I recover the handles in the fresh script? I wouldn't think they would be automagically named the same, though I confess I haven't tried it yet.

If exec doesn't work that way, can anyone think of another way to recompile without losing the handles? I wouldn't mind having the instruction pointer reset, since I can get back to the processing loop easily enough if the handles are already established.

Read more below for the bit I have now which doesn't do any of the persistence stuff. Hopefully it can clear up any confusion about how I'm doing matching and injection. So far the code is basically just the non-forking TCP server example from perlipc modified to proxy between the incoming client and a predefined remote host, with a few rules added in. It does have an issue (less interesting, for now) where it doesn't flush the handles correctly, in case anybody runs across it. Hitting <ENTER> a few times seems to work around it, sort of.

Thanks,

--athomason

#!/usr/local/bin/perl -Tw

use strict;
use warnings;
use Socket;
use Carp;

my %rules = (
    'client' => {
        'talkback_rule' => {
            pattern => qr/tmtowtdi$/,
            action  => sub {
                           print CLIENT 'Yes, there is!';
                       },
        },
    },
    'server' => {
        'sneeze_rule' => {
            pattern => qr/achoo$/i,
            action  => sub {
                           print CLIENT 'Geshunteit!';
                       },
        },
    },
);

my $maxbuflength = 16 * 1024;

my $EOL = "\015\012";

sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }

# port we listen on
my $listen_port = shift @ARGV || 1234;

# address we connect to
my $server_host = shift @ARGV || 'localhost';
my $server_port = shift @ARGV || 4321;

my $proto = getprotobyname( 'tcp' );

# listen for an incoming connection; see perlipc
socket( PROXY, PF_INET, SOCK_STREAM, $proto )          || die "socket:
+ $!";
setsockopt( PROXY, SOL_SOCKET, SO_REUSEADDR,
    pack( "l", 1 ) )                                   || die "setsock
+opt: $!";
bind( PROXY, sockaddr_in( $listen_port, INADDR_ANY ) ) || die "bind: $
+!";
listen( PROXY, SOMAXCONN )                             || die "listen:
+ $!";

logmsg "server started on port $listen_port";

my $paddr;

for ( ; $paddr = accept( CLIENT, PROXY ); close CLIENT ) {
    my( $port, $iaddr ) = sockaddr_in( $paddr );
    my $name = gethostbyaddr( $iaddr, AF_INET );
 
    logmsg "connection from $name [",
        inet_ntoa($iaddr), "]
        at port $port";
 
    # here the meat
    my $riaddr = inet_aton( $server_host );
    my $rpaddr = sockaddr_in( $server_port, $riaddr );
    
    socket( SERVER, PF_INET, SOCK_STREAM, $proto ) or die "socket: $!"
+;
    connect( SERVER, $rpaddr ) or die "connect: $!";

    logmsg "connected to $server_host:$server_port";
    
    # autoflush everything
    select SERVER; $|++;
    select CLIENT; $|++;
    select STDOUT; $|++;
    
    # stdin is a control connection
    my $rstdin = '';
    vec( $rstdin, fileno( STDIN ), 1 ) = 1;
    
    # server is the remote host we connected to
    my $rserver = '';
    vec( $rserver, fileno( SERVER ), 1 ) = 1;
    
    # client is the host that connected to us
    my $rclient = '';
    vec( $rclient, fileno( CLIENT ), 1 ) = 1;
    
    my $commandbuf = ''; # stuff read from stdin
    my $serverbuf = ''; # stuff read from server
    my $clientbuf = ''; # stuff read from client
    my $iobuf; # intermediate buffer
    
    while ( 1 ) {
        my $rout = '';
        my $rin = $rserver | $rclient | $rstdin;
    
        select( $rout = $rin, undef, undef, 0.01 );
    
        my $gotstdin = vec( $rout, fileno( STDIN ), 1 );
        my $gotserver = vec( $rout, fileno( SERVER ), 1 );
        my $gotclient = vec( $rout, fileno( CLIENT ), 1 );
    
        #printf "%vxd\n", $rout;
        
        if ( $gotserver ) {
            exit unless defined read( SERVER, $iobuf, 1 );
            print CLIENT $iobuf; # proxy server->client
    
            if ( length $serverbuf > $maxbuflength ) {
                $serverbuf = substr( $serverbuf, 1 ) . $iobuf;
            }
            else {
                $serverbuf .= $iobuf;
            }
            handleData( $serverbuf, 'server' );
        }

        if ( $gotclient ) {
            exit unless defined read( CLIENT, $iobuf, 1 );
            print SERVER $iobuf; # proxy client->server

            if ( length $clientbuf > $maxbuflength ) {
                $clientbuf = substr( $clientbuf, 1 ) . $iobuf;
            }
            else {
                $clientbuf .= $iobuf;
            }
            handleData( $clientbuf, 'client' );
        }
        
        if ( $gotstdin ) {
            read( STDIN, $iobuf, 1 );
            $commandbuf .= $iobuf;

            print "stdin: $iobuf\n";

            if ( $iobuf eq "\n" ) {
                # process complete command
                if ( lc $commandbuf eq "quit" ) {
                    exit;
                }
                else {
                    print STDERR "unknown command $commandbuf\n";
                }
                $commandbuf = "";
            }
        }
    }
}

sub handleData {
    my $data = shift;
    my $ruleset = shift;

    if ( open TRACE, "> trace.$ruleset" ) {
        print TRACE $data;
        close TRACE;
    }

    die "unknown ruleset $ruleset" unless exists $rules{ $ruleset };

    for my $rulename ( keys %{ $rules{ $ruleset } } ) {
        if ( $data =~ $rules{ $ruleset }{ $rulename }{ pattern } ) {
            $rules{ $ruleset }{ $rulename }{ action }->( );
            print STDERR "matched rule $ruleset->$rulename\n";
        }
    }
}
[download]

Comment on Restarting script without losing handles Select or Download Code

Replies are listed 'Best First'.
(tye)Re: Restarting script without losing handles by tye (Sage) on Aug 21, 2002 at 19:33 UTC
Yes, by default, exec closes all of your file handles except for STDIN, STDOUT, and STDERR. See $^F in perlvar. But that will probably only be part of the solution. You may also have to turn off the close-on-exec flag on some file handles directly. See F_GETFD and F_SETFD in Fcntl. The file descriptors are what are not closed so you have to do the equivalent of fdopen() to get Perl file handles reassociated with them: `open( FILE, ">&=$fd" ) or die ...` where $fd is 0 for STDIN, 1 for STDOUT, 2 for STDERR (and Perl already reopened those for you) and you have to pass the new instance of the script the values for the file descriptors you want to reopen, for example: `exec( $^X, $0, fileno(SOCK), fileno(LOG) );` and `open( SOCK, "<&=$ARGV[0]" ) or die ... open( LOG, ">>&=$ARGV[1]" ) or die ...` [download] Update: Restarting a long-running process from time to time can be very useful (reduces memory footprint, clears likely subtle internal corruptions due to low-profile bugs, etc.). And if you go with catching signals and not restarting the process, then be sure to use Perl v5.8 or later. - tye (but my friends call me "Tye")	[reply] [d/l] [select]
Re: Restarting script without losing handles by fokat (Deacon) on Aug 21, 2002 at 19:25 UTC
I feel you can solve this problem in an easier way, like this: Place your regexes and actions in a config file. This would be Perl code placing the regular expressions in a data structure (hash, array, whatever). When the script starts, it sets up the sockets and `eval()`s said config file, thus learning the last set of rules. Install a signal handler for `SIGHUP`. Upon receiving this signal, you can wipe the data structure and re-`eval()` the config file, thus learning the new rules and actions. The user only needs to modify the config file and `kill -HUP` the process id of your script. You can also get fancy and detect changes on the script, though I do not advise this. One of the advangtages of this method, is that the config file can be reasonably checked by using `perl -c config-file`. You can also do this by `eval()`ing the config file in a separate namespace first (checking for errors in `$@`) and if all goes well, proceeding to the real namespace. In fact, you could accomplish the last phase by placing your script in a separate namespace, say: `package __my_script; # Your script goes here` [download] The config file could be explicitly placed in the `main` namespace. You can then delete all non-built-in symbols in `main` deleted. This would get rid of even the actions defined as `sub`s in the config file. I have a similar thing in production which works fine. It handles around 20 RADIUS authentications per second :) Good luck.	[reply] [d/l]