athomason has asked for the wisdom of the Perl Monks concerning the following question:
Greetings folks,
I'm working on a TCP proxy server that watches text traffic incoming on two sockets and
occassionally injects its own data, based on regex matches on the stream data. The interesting part is that the
rules and actions aren't predefined: as the program is monitoring the sockets it needs to take
input from somewhere (mabye STDIN) concerning 1) the regexen that determine action and 2) the
actions themselves (probably as anonymous subs). The connections can't be dropped
(so the script can't just be killed and reloaded), though a lag of a few seconds after adding
a rule is acceptable (see further down why that might be useful).
Input from STDIN is implicitly trusted, so I don't feel wholly unjustified
eval'ing what I read there (though it would require care, for sure), but I also
need the stuff that's added to be persistent (i.e., last bewteen proxy sessions). The
simplest thing that comes to mind is Data::Dumper'ing the ruleset to a file (or DATA, even)
and eval'ing that when the script starts up. New rules added during runtime
would be added to the ruleset and written out to disk for later recovery. There are a number
of hacks required to go that route, though. For one, I anticipate needing global state
variables that are set and used in the rules; while I could store a mini-symbol table as
%state or some such, it sure seems that would be adding unnecessary clutter.
Ideally, I could put all the matching logic in the script itself instead, and somehow
recompile and restart it without dropping the connection. Though there would be a delay as it recompiled, I could live with it. This strategy would also avoid the
thoroughly unpleasant business of eval'ing in a network server. I had an idea
that exec would be useful for this purpose, since I seem to recall filehandles
are propogated to the transferee (like fork does). However, I wouldn't swear to
it, and I don't see a mention of that in the perlfunc. Besides, if I do
exec myself, how do I recover the handles in the fresh script? I wouldn't think
they would be automagically named the same, though I confess I haven't tried it yet.
If exec doesn't work that way, can anyone think of another way to recompile
without losing the handles? I wouldn't mind having the instruction pointer reset, since I can
get back to the processing loop easily enough if the handles are already established.
Read more below for the bit I have now which doesn't do any of the persistence stuff. Hopefully it can clear up any confusion about how I'm doing matching and injection.
So far the code is basically just the non-forking TCP server example from perlipc modified to proxy between
the incoming client and a predefined remote host, with a few rules added in. It does have an issue (less interesting, for now) where it doesn't flush the handles correctly, in case anybody runs across it. Hitting <ENTER> a few times seems to work around it, sort of.
Thanks,
--athomason
#!/usr/local/bin/perl -Tw
use strict;
use warnings;
use Socket;
use Carp;
my %rules = (
'client' => {
'talkback_rule' => {
pattern => qr/tmtowtdi$/,
action => sub {
print CLIENT 'Yes, there is!';
},
},
},
'server' => {
'sneeze_rule' => {
pattern => qr/achoo$/i,
action => sub {
print CLIENT 'Geshunteit!';
},
},
},
);
my $maxbuflength = 16 * 1024;
my $EOL = "\015\012";
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
# port we listen on
my $listen_port = shift @ARGV || 1234;
# address we connect to
my $server_host = shift @ARGV || 'localhost';
my $server_port = shift @ARGV || 4321;
my $proto = getprotobyname( 'tcp' );
# listen for an incoming connection; see perlipc
socket( PROXY, PF_INET, SOCK_STREAM, $proto ) || die "socket:
+ $!";
setsockopt( PROXY, SOL_SOCKET, SO_REUSEADDR,
pack( "l", 1 ) ) || die "setsock
+opt: $!";
bind( PROXY, sockaddr_in( $listen_port, INADDR_ANY ) ) || die "bind: $
+!";
listen( PROXY, SOMAXCONN ) || die "listen:
+ $!";
logmsg "server started on port $listen_port";
my $paddr;
for ( ; $paddr = accept( CLIENT, PROXY ); close CLIENT ) {
my( $port, $iaddr ) = sockaddr_in( $paddr );
my $name = gethostbyaddr( $iaddr, AF_INET );
logmsg "connection from $name [",
inet_ntoa($iaddr), "]
at port $port";
# here the meat
my $riaddr = inet_aton( $server_host );
my $rpaddr = sockaddr_in( $server_port, $riaddr );
socket( SERVER, PF_INET, SOCK_STREAM, $proto ) or die "socket: $!"
+;
connect( SERVER, $rpaddr ) or die "connect: $!";
logmsg "connected to $server_host:$server_port";
# autoflush everything
select SERVER; $|++;
select CLIENT; $|++;
select STDOUT; $|++;
# stdin is a control connection
my $rstdin = '';
vec( $rstdin, fileno( STDIN ), 1 ) = 1;
# server is the remote host we connected to
my $rserver = '';
vec( $rserver, fileno( SERVER ), 1 ) = 1;
# client is the host that connected to us
my $rclient = '';
vec( $rclient, fileno( CLIENT ), 1 ) = 1;
my $commandbuf = ''; # stuff read from stdin
my $serverbuf = ''; # stuff read from server
my $clientbuf = ''; # stuff read from client
my $iobuf; # intermediate buffer
while ( 1 ) {
my $rout = '';
my $rin = $rserver | $rclient | $rstdin;
select( $rout = $rin, undef, undef, 0.01 );
my $gotstdin = vec( $rout, fileno( STDIN ), 1 );
my $gotserver = vec( $rout, fileno( SERVER ), 1 );
my $gotclient = vec( $rout, fileno( CLIENT ), 1 );
#printf "%vxd\n", $rout;
if ( $gotserver ) {
exit unless defined read( SERVER, $iobuf, 1 );
print CLIENT $iobuf; # proxy server->client
if ( length $serverbuf > $maxbuflength ) {
$serverbuf = substr( $serverbuf, 1 ) . $iobuf;
}
else {
$serverbuf .= $iobuf;
}
handleData( $serverbuf, 'server' );
}
if ( $gotclient ) {
exit unless defined read( CLIENT, $iobuf, 1 );
print SERVER $iobuf; # proxy client->server
if ( length $clientbuf > $maxbuflength ) {
$clientbuf = substr( $clientbuf, 1 ) . $iobuf;
}
else {
$clientbuf .= $iobuf;
}
handleData( $clientbuf, 'client' );
}
if ( $gotstdin ) {
read( STDIN, $iobuf, 1 );
$commandbuf .= $iobuf;
print "stdin: $iobuf\n";
if ( $iobuf eq "\n" ) {
# process complete command
if ( lc $commandbuf eq "quit" ) {
exit;
}
else {
print STDERR "unknown command $commandbuf\n";
}
$commandbuf = "";
}
}
}
}
sub handleData {
my $data = shift;
my $ruleset = shift;
if ( open TRACE, "> trace.$ruleset" ) {
print TRACE $data;
close TRACE;
}
die "unknown ruleset $ruleset" unless exists $rules{ $ruleset };
for my $rulename ( keys %{ $rules{ $ruleset } } ) {
if ( $data =~ $rules{ $ruleset }{ $rulename }{ pattern } ) {
$rules{ $ruleset }{ $rulename }{ action }->( );
print STDERR "matched rule $ruleset->$rulename\n";
}
}
}
(tye)Re: Restarting script without losing handles
by tye (Sage) on Aug 21, 2002 at 19:33 UTC
|
Yes, by default, exec closes all of your file handles except for STDIN, STDOUT, and STDERR. See $^F in perlvar. But that will probably only be part of the solution. You may also have to turn off the close-on-exec flag on some file handles directly. See F_GETFD and F_SETFD in Fcntl.
The file descriptors are what are not closed so you have to do the equivalent of fdopen() to get Perl file handles reassociated with them:
open( FILE, ">&=$fd" ) or die ...
where $fd is 0 for STDIN, 1 for STDOUT, 2 for STDERR (and Perl already reopened those for you) and you have to pass the new instance of the script the values for the file descriptors you want to reopen, for example:
exec( $^X, $0, fileno(SOCK), fileno(LOG) );
and
open( SOCK, "<&=$ARGV[0]" ) or die ...
open( LOG, ">>&=$ARGV[1]" ) or die ...
Update: Restarting a long-running process from time to time can be very useful (reduces memory footprint, clears likely subtle internal corruptions due to low-profile bugs, etc.). And if you go with catching signals and not restarting the process, then be sure to use Perl v5.8 or later.
- tye (but my friends call me "Tye") | [reply] [d/l] [select] |
Re: Restarting script without losing handles
by fokat (Deacon) on Aug 21, 2002 at 19:25 UTC
|
I feel you can solve this problem in an easier way, like this:
- Place your regexes and actions in a config file. This would be Perl code placing the regular expressions in a data structure (hash, array, whatever).
- When the script starts, it sets up the sockets and eval()s said config file, thus learning the last set of rules.
- Install a signal handler for SIGHUP. Upon receiving this signal, you can wipe the data structure and re-eval() the config file, thus learning the new rules and actions.
The user only needs to modify the config file and kill -HUP the process id of your script. You can also get fancy and detect changes on the script, though I do not advise this. One of the advangtages of this method, is that the config file can be reasonably checked by using perl -c config-file. You can also do this by eval()ing the config file in a separate namespace first (checking for errors in $@) and if all goes well, proceeding to the real namespace.
In fact, you could accomplish the last phase by placing your script in a separate namespace, say:
package __my_script;
# Your script goes here
The config file could be explicitly placed in the main namespace. You can then delete all non-built-in symbols in main deleted. This would get rid of even the actions defined as subs in the config file.
I have a similar thing in production which works fine. It handles around 20 RADIUS authentications per second :)
Good luck. | [reply] [d/l] |
|
|