Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

A SpamAssassin-Enabled POP3 Proxy

by McD (Chaplain)
on Aug 27, 2002 at 01:56 UTC ( [id://193065] : CUFP . print w/replies, xml ) Need Help??

I dig SpamAssassin - a Perl based spam filtering suite.

It's somewhat biased towards a Unix flavored mail server soloution though, and I read my mail at home on Win32 with a POP3 mail client. Itch, meet scratch.

This is a simple POP3 protocol proxy which will intercept mail messages en route from a POP3 server and wash them thru SpamAssassin before returning them to the mail client (Mozilla, Outlook, Eudora, etc). It's feature rich (for what it is - it's pretty simple at the end of the day) but it does require some know-how to get set up. There's also a slew of cool SpamAssassin features that it doesn't support, but might someday. It was written and debugged with Win32 use in mind, but I suspect the code is pretty portable to any modern Perl install - let me know if you have success on a non-Win32 platform with it!

There's a manpage with detailed instructions on installing, configuring, and using this proxy here, and here's a nice zip with the script and related goodies:

The code is posted below - it's the same as the zip, but I've removed the POD directives for brevity's sake. See the manpage if you plan on setting about using this.


#!perl -w # Pop3proxy - a SpamAssassin enabled POP3 proxy designed for Win32 # users. use strict; # Set this to zero to turn off all debugging statements. Set to 1 for # basic debugging, which is pretty verbose, set it to 2 to add a dump # of key data structs on connect, set it to 3 to add a dump of every # read/write we do. (Oy) use constant DEBUGGING => 1; # Seems that SpamAssassin wants to remove the dependency on # Time::HiRes. I only need it for measuring performance, so I'll only # include it if it's available. Have to eval the "use constant" # statements to avoid redefinition warnings. # # I use constants for debugging switches because I believe they get # optimized out by the compiler if they're false. I could be wrong. BEGIN { eval "use Time::HiRes"; if ($@) { eval "use constant TIMERS => 0"; } else { eval "use constant TIMERS => 1"; } } # A set of enumerated reasons why we're snarfing a multiline response # for a socket. use constant RETR => 1; use constant TOP => 2; use constant CAPA => 3; use IO::Socket; use IO::Select; use FindBin; use Mail::SpamAssassin; use Getopt::Long; ######################### # A BUNCH OF EVIL GLOBALS ######################### # Set this to be a file that will contain the debug log. Set to an # empty string to debug to STDOUT. --logfile command line arg sets. my $logfile = 'pop3proxy.log'; # Hostmap - keys are ports to listen to on localhost, values are # hostname:port to proxy connections on the key port to. Set up by # the command line --host arg or by the hostmap.txt config file. # # A simplest case - you get your mail from a server server named #, on the standard POP3 port (110): # # my %hostmap = ( 110 => '' ); # # ...And you change your mail client to get mail from localhost. # # Fancier case - you pop mail off of two hosts, and # # # my %hostmap = ( # 817 => '', # 818 => '', # ) # # In that case, the proxy listens to TWO sockets on localhost - 817, # and 818, proxying off to two separate remote hosts as indicated. # # Note that for this to work, you need to be able to tell your mail # client to connect to two different ports on localhost to find the # proxy - namely, 817 for and 818 for # Some mail clients, like Netscape 4.5's, won't # let you specify the port to use for a pop3 connection. Oops. # Others, like Mozilla 1.0, will let you set the port, but won't allow # two servers to be on the same host (localhost in this case). You # can work around THAT by creating another alias for localhost in your # C:\Windows\Hosts file: # # localhost MyHostName # # ...and then configuring one account for localhost:817 and the other # for MyHostName:818 my %hostmap = (); # Respect_byte_count - If TRUE, then we do not alter the byte count of # the message when marking it as spam - instead, we overwrite portions # of the headers, such as changing the first five characters of the # Subject: line to "*SPAM*" (a shortened form of SpamAssassin's famous # subject prefix). Set by the command line --nopad arguement. # # This, because under certain conditions the POP3 protocol indicates # message and mailbox sizes, and the safe thing is not to enlarge # those sizes while marking a message as spam. # # If there is no Subject: line in the mail headers (there doesn't have # to be, after all) or if it's less than 5 bytes, then we use the # first Received: line we find instead. # # Setting this value to FALSE (0) seems to work with most mail # clients, and it causes us to proxy back the mail as it's been # modified by SpamAssassin, which gives you a wonderful great lot of # info about WHY it's labeled as spam, and also labels it clearly and # beyond doubt, and defangs the MIME contents, etc, etc - but it # *could* break the mail client. Harumph. my $respect_byte_count = 0; # If true, we let the POP3 "TOP" command go thru to the server, # otherwise, we don't proxy the TOP command and return an error back # to the client. Set by the command line --allowtop arguement. # # TOP is specified as an optional command, it shows you the headers of # a mail message and a configurable number of lines of the body. The # idea is that you can sort of "screen" what you choose to download or # not before you do. All well and good, but our spam filtering can # cause this to break when we scan the actual message during retrieval # and potentially modify or add to the headers, such as changing the # subject line to start with *****SPAM***** or something. # # This breaks the protocol a little and could have unusual or possibly # even destructive consequences. Since it's an optional part of the # protocol, most mail clients should be coded to work without it, # hence, by default, we avoid the problem by turning it off. my $allow_top = 0; # Here's the problem with using SpamAssassin in this way - given a # large enough message, he will take a LONG time to scan it, where # long is like sixteen minutes on a P-II 350 running Linux for a 3MB # text message. Maybe that was a degenerate case of some sort, but # there it is. If SpamAssassin takes long enough to scan a message, # the mail client (who's not getting any data in response to his RETR # command during all this) will eventually time out. Sockets close, # data is lost, etc, etc. Very bad, very difficult to fix and get on # with your life if you have a large mail message on the server that # keeps causing this. # # Hence, this config parameter. If a message exceeds this size while # we're snarfing it, we'll abandon the snarf, start passing the data # back to the client, and no scan of the message by SpamAssassin will # be performed. # # Setting this to zero turns this behavior off - all messages will be # scanned, regardless of size. # # I chose a 250K default for this value after analyzing a few months # worth of spam - 1500 messages. The average size was about 9K, the # largest was 110K. I figured double the largest would allow most of # the spam we see today to get scanned, without trouble. # # This has the added side effect of keeping our memory usage down - # that scan of a 3MB message took 86MB worth of memory. That's not # such a hot idea for a daemon. my $max_scan_size = 250000; # If we're invoked with a logfile for output using ActiveState's # wperl.exe, we can effectively hum along in the background. Nice. I # don't want to send the user to Task Manager to shut us down, and # under Win98 at least you get the nasty "application not responding" # dialog box because I'm busy waiting for to select a socket, so # instead we have this - a port that we listen on for the purposes of # exiting. Any connection to it from localhost, and I'll get out of # town. # # The default is 9625 (which is otherwise unused). Set this to zero # to disable this behavior. my $exit_port = 9625; # Note CRLF == \015\012 my $no_top = "-ERR Not supported by proxy\015\012"; # %peer - mapping of client socket => server socket, and vice versa. # # Keys are stringified references to IO::Socket objects, values are # actual references to the same. It's a little ugly to contemplate, # but it works just dandy. # # The Peer mapping is removed when the peer is closed. Thus, if # you're reading data on $socket: # # The destination of this data is $reading_buf{$peer{$socket}}, and, # If there is no destination any more, there's no point in reading the # data, so shut down, and, # If you read some data, add the $peer{$socket} to the Writeable set, # because now you want to write something to him. # # And, if you're writing to $socket, # # The data is in $writing_buf{$socket}, and, # Once all the data is written, you should close $socket if # $peer{$socket} is missing. my %peer; # %is_client - stringified IO::Socket references for keys, true or # false values based on whether that socket is connected to a client # or the server. my %is_client; ################# # Buffers galore. ################# # The general flow of data is: # # data from $socket -> $peer = $peer{$socket} -> read data into # $reading_buf{$peer} -> hook protocol, snarfing to $message{$peer} if # needed -> move data into $writing_buf{$peer} -> write data to $peer # %reading_buf - keys are sockets, value is buffer of data read from t +hat # socket's peer, waiting to be proxy'd to the socket. my %reading_buf; # %writing_buf - keys are still sockets, value is data from the # %reading_buf buffer which is now ready for writing to the socket. my %writing_buf; # Hash of socket => buffer, buffer is filled up with the message being # snarfed. Then the buffer is scanned and modified, then copied into # $writing_buf{$socket} and flushed back to the client. my %message; # Hash of socket => enums, set to the reason we're snarfing a # multiline response into %message_for array for this socket. Set to # zero (false) if we're NOT snarfing this data. my %snarfing; # Hash of Client socket => queue of commands the client has requested. my %client_commands; # Hash of listening sockets - keys are stringified socket object refs, # values are the host:port we should proxy connections on that socket # to. my %proxyto; # Flags - toggled on and off to indicate if we're reading a multiline # response or not. Keys are sockets. my %reading_multiline_response; # Hash - keys are sockets, values are HiRes timer floats. Used to # time downloads. my %snarf_start; ######## # "Main" ######## # Get in your directory chdir "$FindBin::RealBin"; read_config() if -s "./hostmap.txt"; my $cl_proxyto; my $helpflag = 0; usage() unless GetOptions("logfile:s" => \$logfile, "nopad" => \$respect_byte_count, "allowtop" => \$allow_top, "maxscan=i" => \$max_scan_size, "exitport=i" => \$exit_port, "host=s" => \$cl_proxyto, "help" => \$helpflag, ); usage() if $helpflag; if ($cl_proxyto) { warn "WARNING: $cl_proxyto overrides hostmap.txt entry: $hostmap{110 +}\n" if exists $hostmap{110}; # We're nice to command line users. If you tag a :port onto your # hostname, that's cool, otherwise, you get :110 for free. $cl_proxyto .= ':110' unless $cl_proxyto =~ /:\d+$/; $hostmap{110} = $cl_proxyto; } die "No proxy host! Use --host or hostmap.txt\n" unless keys %hostmap +; # Prevent concurrent proxies - kill any previous instance if (IO::Socket::INET->new(PeerAddr => 'localhost', PeerPort => $exit_port, Proto => "tcp", Type => SOCK_STREAM)) { warn "WARNING: Existing proxy killed\n"; } if ($logfile) { # Redirect stdout and stderr to logfile if specified. # Windows strangeness - you can't reopen STDOUT/STDERR successfully # under wperl.exe unless you've already closed it. Go figure. close STDOUT; close STDERR; open(STDOUT, "> $logfile") or die "Can't redirect stdout: $!"; open(STDERR, ">&STDOUT") or die "Can't dup stdout: $!"; } $| = 1; # The SpamAssassin scanner. # # We tell it to use ./user_prefs, and not to try to copy in a default # if it's not there (because he goes looking for a template file in # all the usual places to copy over - all the usual *UNIX* places). # We tell it to run only local tests, because otherwise, you'll get # complaints and timeouts when it can't find a dcc app to run or it # can't find a DNS server or the network is the wrong color or # whatever. In theory, you could get away with DNS RBL checks, but # I've had a hard time making Net::DNS work on my Win32 setup. # Hackers welcome, best of luck. See notes at end of file. my $spamtest = Mail::SpamAssassin->new({ userprefs_filename => './user_prefs', dont_copy_prefs => 1, local_tests_only => 1, }); my $readable = IO::Select->new; my $writeable = IO::Select->new; # Create sockets to listen on. foreach my $port (keys %hostmap) { my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5 +, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; print "Listening for connections on port $port (proxy $hostmap{$port +})\n" if DEBUGGING; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; } # Create the "exit socket" - any connection on this socket from # localhost will cause us to exit. my $exit_socket; if ($exit_port) { $exit_socket = IO::Socket::INET->new(LocalPort => $exit_port, Listen + => 1, Reuse => 1); $readable->add($exit_socket); } while(1) { my ($toread, $towrite) = IO::Select->select($readable, $writeable); foreach my $socket (@$toread) { if ($socket == $exit_socket) { all_done($socket); next; # Just in case it wasn't from localhost } # Is it a new connection? if (exists $proxyto{$socket}) { dump_data_structs() if (DEBUGGING > 1); # Open connection to remote, add to readable set, map it # to this new client connection. my $remote = IO::Socket::INET->new(PeerAddr=>$proxyto{$socket}); $readable->add($remote) if $remote; if (not $remote) { # Break the incoming new client off, create a new # listener to try again. print "Connect to remote: $proxyto{$socket} FAILED: $@\n" if D +EBUGGING; my $port = $socket->sockport; $socket->close; $readable->remove($socket); my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; next; } # Accept the connection and add it to our readable list. my $new_sock = $socket->accept; $readable->add($new_sock) if $new_sock; die "Can't create new socket for incoming connection: $!" unless $new_sock; # Create proxy/peer mapping, set client/server indicators, # create buffers, etc. $peer{$new_sock} = $remote; $peer{$remote} = $new_sock; $is_client{$new_sock} = 1; $is_client{$remote} = 0; $message{$new_sock} = ''; $snarfing{$new_sock} = 0; # The first thing we'll see is a response to no command at # all - "+OK Welcome to" - so we seed the # command queue with a dummy command to eleminate warnings # later on. $client_commands{$new_sock} = [('none')]; foreach ($new_sock, $remote) { $reading_buf{$_} = ''; $writing_buf{$_} = ''; } if (DEBUGGING) { print "\nNew connection:\n"; print "From: ", $new_sock->peerhost, ':', $new_sock->peerport,"\n"; print "To: ", $remote->peerhost, ':', $remote->peerport, "\n"; } } else { # It's an established connection my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $proxy; # Which socket we're going to proxy this data to if (exists $peer{$socket}) { $proxy = $peer{$socket}; } else { # No peer. print "\n$key - peer gone on read" if DEBUGGING; # No need to keep hearing about how it's ready to be # read - we've got no use for subsequent data. $readable->remove($socket); # Tear down connection, unless there's data waiting to # be written to it - in that case, we'll catch it in # writeables and close it when we're done. if (! data_waiting($socket)) { print ", nothing to write, closing socket" if DEBUGGING; clean_up($socket); } print "\n" if DEBUGGING; next; } # Why 4096 bytes? I dunno. You got a better buffer size? unless (my $n = sysread($socket, $reading_buf{$proxy}, 4096, length($reading_buf{$proxy}))) { warn "sysread: $!\n" if not defined $n; # Shut down the socket print "\n$key - socket close on read" if DEBUGGING; clean_up($socket); # Remove the proxy map delete $peer{$socket}; delete $peer{$proxy}; if (! data_waiting($proxy)) { # No pending data - tear down the peer as well. print ", closing peer too" if DEBUGGING; clean_up($proxy); } print "\n" if DEBUGGING; next; } if (DEBUGGING > 2) { $is_client{$socket} ? print "C< " : print "S< "; print "\n"; } # Got data from a socket. Go do something clever with it. run_hooks($proxy); } } # End of readables # Next, do something with each socket ready to write. Like, write # to it. foreach my $socket (@$towrite) { my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $wrote = syswrite($socket, $writing_buf{$socket}) or do { warn "syswrite: $!\n"; print "\n$key - socket close on write" if DEBUGGING; clean_up($socket); # Remove the proxy map if (exists $peer{$socket}) { my $proxy = $peer{$socket}; delete $peer{$proxy}; delete $peer{$socket}; if (! data_waiting($proxy)) { print ", closing peer too" if DEBUGGING; clean_up($proxy); } } print "\n" if DEBUGGING; next; }; if (DEBUGGING > 2) { $is_client{$socket} ? print "C> " : print "S> "; print "\n"; } # Scrub the just-written data from the buffer substr($writing_buf{$socket}, 0, $wrote, ""); # All done writing? if (! length($writing_buf{$socket})) { $writeable->remove($socket); if (! exists $peer{$socket}) { # No peer? Tear down connection. print "\n$key - peer gone after write, closing\n" if DEBUGGING +; clean_up($socket); next; } } } # end of writeables } # data_waiting($socket) # # Returns true if there's any data waiting to be proxy'd to this socke +t. # # Reason this works - we only check data_waiting() on a socket *after* # we've closed it's peer. Closing the peer in clean_up(), below, will # have the effect of flushing any pending %message buffers (and # %reading_buf, for that matter) to %writing_buf, and hence, all the # data which is "waiting" is, in fact, guaranteed to now be waiting. sub data_waiting { my $socket = shift; return (length($reading_buf{$socket}) or length($writing_buf{$socket +})); } # clean_up($socket) # # Given a socket, close it, stop selecting it for anything, clean up # all our structs that refer to it, set the peer if any to flush # buffers. sub clean_up { my $socket = shift; # This socket is history. If there's a peer, then that peer # currently has all the data it's ever gonna get. Flush that data # into the writing_buf and add it to the writeable set. # # Ok, technically, this *could* burn you if what you were caching # away in %message was a multiline TOP response that you were # going to discard anyway, and now I'm going to flush it to the # client, instead. Look, the client is going to get an error # condition *anyway* because the darn socket is GONE, man, just # like that, in the middle of a multiline response! I will # venture to say that no harm will come of this - but if it does, # we can always make this behave a lot more like a "last ditch" # run_hooks() session. if (exists $peer{$socket}) { my $proxy = $peer{$socket}; $writing_buf{$proxy} .= $message{$proxy} if exists ($message{$prox +y}); $writing_buf{$proxy} .= $reading_buf{$proxy}; $reading_buf{$proxy} = ''; $message{$proxy} = ''; $snarfing{$proxy} = 0; if (length ($writing_buf{$proxy})) { $writeable->add($proxy); print "\nFlushing peer on close\n" if DEBUGGING; } } # Note that you can apparently remove a socket more than once from # an IO::Select set. Also you can delete a key/value pair from a # hash that doesn't exist. Love Perl. DWIM. $readable->remove($socket); $writeable->remove($socket); $socket->close; delete $reading_buf{$socket}; delete $writing_buf{$socket}; delete $is_client{$socket}; delete $snarfing{$socket}; delete $message{$socket}; delete $client_commands{$socket}; delete $reading_multiline_response{$socket}; delete $snarf_start{$socket}; } # run_hooks($socket) # # This is where we hook the POP3 protocol. Called whenever a socket # gets new data in it's buffer, we can do whatever you want here. The # default is to wait until there's a \n in the %reading_buf buffer, th +en (in # a loop) move all those bytes into the %writing_buf buffer (giving us + the # window to look at a full line of I/O), then add the socket to the # writeable set, thereby causing the contents of %writing_buf to get # flushed to the socket. # # Under certain conditions, though, we'll want to intercept the # protocol, at which point we snarf the data off into %message until # it's done, then we look at it or replace it or something, and THEN # we ship it off to %writing_buf for flushing to the client. # # Client commands are pushed onto a queue of commands, server # responses shift commands off that queue. This way we can support # pipelining client/servers, per rfc 2449 # # Note - logically, the %peer mapping must be intact when you get # here. The main loop enforces this. You may assume that # $peer{$socket} will exist and be valid in this routine. my $pos; sub run_hooks { my $socket = shift; # This loop looks for the first occurance of a \n in a string, # then MOVES all of the string up to and including the \n into the # output buffer and adds the socket to the set of sockets we'd # like to write to. Then it loops looking for another \n. # # Just before the move, you can examine the beginning of # $reading_buf{$socket} to see what kinds of interesting thingies mi +ght # be in there, in the confidence that it's a real full line of # data from the protocol. You can say things like: # # $reading_buf{$socket} =~ /^(.*)$/m # /m lets $ match next to embe +dded \n $pos = -1; while (($pos = index($reading_buf{$socket}, "\012", 0)) > -1) { # Right here you can examine $reading_buf{$socket} if ($is_client{$socket}) { # Hooks here for data from the server to the client # Responses from the server are interesting. They can be # single line, in which case they MUST start with "+OK" or # "-ERR", or else they're part of a multiline response, # such as a LIST or RETR command, in which case they MUST # end with a CRLF.CRLF. if ($reading_buf{$socket} =~ /^(\+OK|-ERR)/i and not $reading_multiline_response{$socket}) { # Response to a command my $command = shift @{$client_commands{$socket}}; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport . " (Server) said $1 to $command\n" if DEBUGGING; # Always include the greeting line in the log. if (DEBUGGING and $command eq 'none') { print $reading_buf{$socket}; } die "Assertion failed: snarfing outside multiline response" if ($snarfing{$socket}); # Only interested in snarfing successful response - # none of the error responses are multiline. if (substr ($1, 0, 1) eq '+') { if ($command =~ /^TOP$/i and not $allow_top) { print "Snarfing TOP response\n" if DEBUGGING; $snarfing{$socket} = TOP; } if ($command =~ /RETR/i) { print "Snarfing RETR response\n" if DEBUGGING; $snarf_start{$socket} = Time::HiRes::gettimeofday if TIMERS; $snarfing{$socket} = RETR; } if ($command =~ /CAPA/i) { print "Snarfing CAPA response\n" if DEBUGGING; $snarfing{$socket} = CAPA; } } } elsif ($reading_buf{$socket} =~ m|^\.\015?\012|) { # End of a multiline response $reading_multiline_response{$socket} = 0; if ($snarfing{$socket}) { print "Detected end of snarfed multiline\n" if DEBUGGING; printf "Download took %.8f seconds\n", Time::HiRes::gettimeofday - $snarf_start{$socket} if (DEBUGGING and TIMERS); # At this point, $message{$socket} contains the # full multiline response, +OK up to but not # including this trailing ".CRLF". if ($snarfing{$socket} == RETR) { # Right here, $message{$socket} is ripe for # scanning. scan_mail(\$message{$socket}); $writing_buf{$socket} .= $message{$socket}; } elsif ($snarfing{$socket} == TOP) { # Eat the .CRLF, add the error message to the # output buffer, flush said output buffer, # clean up your structs and move on. substr($reading_buf{$socket}, 0, $pos+1, ""); $writing_buf{$socket} .= $no_top; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); next; } elsif ($snarfing{$socket} == CAPA) { # Strips out the TOP response, if any. $message{$socket} =~ s/\012TOP[^\012]*\012/\012/ig if not $allow_top; # Strips out the SASL response, if any. $message{$socket} =~ s/\012SASL[^\012]*\012/\012/ig; $writing_buf{$socket} .= $message{$socket}; } $message{$socket} = ''; $snarfing{$socket} = 0; } } else { # Part of a multiline response. Flip the ready flag, # you won't be ready to see another response until you # see your CRLF.CRLF $reading_multiline_response{$socket} = 1; } # At this point, snarf data into %message if snarfing and # move along. if ($snarfing{$socket}) { $message{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, ""); # Check size of snarfed message and stop snarfing if it's # getting too big - see notes at $max_scan_size. if ($max_scan_size != 0 and length($message{$socket}) > $max_scan_size) { print "Message exceeding max scan size, abandoning snarf\n" if DEBUGGING; $writing_buf{$socket} .= $message{$socket}; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); } next; } } else { # Hooks here for data from the client to the server # Spot the client's command, add to the queue. my ($command) = $reading_buf{$socket} =~ /^(\S+)\s/; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport +. " (Client) said $command\n" if DEBUGGING and $command; # AUTH is a special case, see discussion elsewhere. Must # not have any commands in the queue, and we reply back to # the socket immediately with an error. if ($command and $command =~ /^AUTH$/i) { if (scalar(@{$client_commands{$peer{$socket}}})) { die "I so can't cope with AUTH commands while pipelining"; } print "AUTH Rejected\n" if DEBUGGING; substr($reading_buf{$socket}, 0, $pos+1, ""); # Note - $no_top is a generic -ERR response, works fine. $writing_buf{$peer{$socket}} .= $no_top; $writeable->add($peer{$socket}); next; } push (@{$client_commands{$peer{$socket}}}, $command) if $command +; } # Default action after all your shots at hooking and magic, # etc.: Move the data to the writing buffer, and set it up to # get written. $writing_buf{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, +""); $writeable->add($socket); } } sub dump_data_structs { # Dump your current key per-connection data structs print "\nExisting proxy/peer mappings:\n"; print map "$_ => $peer{$_}\n", keys %peer; print "\nExisting is_client flags:\n"; print map "$_ => $is_client{$_}\n", keys %is_client; print "Existing socket reading_buf buffers:\n"; print map "$_ => $reading_buf{$_}\n", keys %reading_buf; print "Existing socket writing_buf buffers:\n"; print map "$_ => $writing_buf{$_}\n", keys %writing_buf; print "Existing message buffers:\n"; print map "$_ => $message{$_}\n", keys %message; print "Existing snarfing flags:\n"; print map "$_ => $snarfing{$_}\n", keys %snarfing; print "Existing command queues:\n"; print map "$_ => @{$client_commands{$_}}\n", keys %client_commands; print "Existing reading_multiline_response flags:\n"; print map "$_ => $reading_multiline_response{$_}\n", keys %reading_multiline_response; print "Existing snarf_start values:\n"; print map "$_ => $snarf_start{$_}\n", keys %snarf_start; } # @mail - array of lines of a mail message. Some notes on memory # usage here: # # Big mail messages getting copied about will chew up memory right # quick. I start with one copy of the message built up in a scalar # buffer, then I need a second copy, broken out into an array of # lines, for Mail::SpamAssassin::NoMailAudit to chew on. That's two # copies. # # I can save a copy's worth of memory by MOVING the lines from the # scalar buffer into the array - but then, once SpamAssassin is done # chewing on them, I have to put them BACK into the scalar buffer. If # I'm not removing them from the SpamAssassin::NoMailAudit object as I # do that, I'm going to wind up with a second copy of the mail # *anyway*. And that kind of removal is nasty and creeps inside of # the objects encapsulation, where I really ought not go. # # NoMailAudit::as_string() returns a copy of the mail as a string, but # to do so, it creates a big ol' scalar on the stack to return. # Simple, but it costs a THIRD chunk of memory the size of the # message. my @mail; sub scan_mail { my $mailref = shift; my $bytecount = length $$mailref; $$mailref =~ s/\012\.\./\012\./g; # un-byte-stuff @mail = split /^/, $$mailref; my $response = shift @mail; # SpamAssassin::NoMailAudit adds a Unix mbox From_ line, unless # you construct your NoMailAudit message with the (ahem, # undocumented) add_From_line param set to false. That From_ # kinda breaks the protocol - the client isn't expecting mbox, # he's expecting raw 822 mail - so we leave it out. my $message = Mail::SpamAssassin::NoMailAudit->new(data => \@mail, add_From_line => +0); my $start; $start = Time::HiRes::gettimeofday if TIMERS; my $status = $spamtest->check($message); printf "Spam check took %.8f seconds\n", Time::HiRes::gettimeofday - $start if (DEBUGGING and TIMERS); my $id = $message->get('Message-id') || '*none*'; print "$bytecount bytes, ", $status->is_spam() ? 'SPAM' : 'NOT spam', ", Message-id: $id\n" if DEBUGGING; print $status->get_report() if DEBUGGING and $respect_byte_count; $status->rewrite_mail() unless $respect_byte_count; if ($status->is_spam ()) { if ($respect_byte_count) { # DAN - danger, you don't know if you're in the headers or not. $$mailref =~ s/\012Subject: [^\012]{6}/\012Subject: *SPAM*/i or $$mailref =~ s/\012Received: [^\012]{6}/\012Received: *SPAM* +/i; } else { # What as_string() does as of SpamAssassin v2.31: # return join ('', $self->get_all_headers()) . "\n" . # join ('', @{$self->get_body()}); $$mailref = $response; $$mailref .= $message->get_all_headers(); $$mailref .= "\015\012"; foreach my $line (@{$message->get_body()}) { $$mailref .= $line; } # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?<!\015)\012|\015\012|g; } } else { if (not $respect_byte_count) { $$mailref = $response; $$mailref .= $message->get_all_headers(); $$mailref .= "\015\012"; foreach my $line (@{$message->get_body()}) { $$mailref .= $line; } # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?<!\015)\012|\015\012|g; } } $status->finish(); $$mailref =~ s/\012\./\012\.\./g; # byte-stuff } sub all_done { my $socket = shift; my $new_sock = $socket->accept; if ($new_sock->peerhost eq '') { print "Connection on exit socket, exiting\n" if DEBUGGING; exit; } else { print "Connection on exit socket from non-local host!\n" if DEBUGG +ING; $new_sock->close; } } sub read_config { open (CONFIG, "./hostmap.txt") or die "Can't read hostmap.txt: $!\n" +; # Straight from the cookbook 8.16 while (<CONFIG>) { chomp; # no newline s/#.*//; # no comments s/^\s+//; # no leading white s/\s+$//; # no trailing white next unless length; # anything left? my ($port, $proxyto) = split(/\s*=\s*/, $_, 2); $hostmap{$port} = $proxyto; } } sub usage { print <<EOT; Usage: $0 --host [options] Options include: --logfile filename Use filename as the log file. Default is pop3proxy.log. If the filename is omitted, log to STDOUT. --nopad If nopad is specified, then message sizes will not be changed as + a result of spam scanning. The default is to add to the message s +ize. --allowtop If top is specified, then the POP3 "TOP" command will be passed +through to the server. The default is to reject client TOP commands wit +h an error message. --maxscan bytes Messages which exceed this size will not be scanned for spam. T +he default is 250000. Setting this to zero disables this behavior. --exitport port Any connection from localhost on this port will cause us to exit +. The default is 9625. Setting this to zero disables this behavio +r. EOT exit; } # Copyright (c) 2002, Dan McDonald. All Rights Reserved. # This program is free software. It may be used, redistributed # and/or modified under the terms of the Perl Artistic License # (see

Replies are listed 'Best First'.
Re: A SpamAssassin-Enabled POP3 Proxy
by FoncÚ (Scribe) on Aug 27, 2002 at 21:08 UTC
    So, tell me...what kind of lag time are we looking at here? I can't run this right now to test, so I wondered if you had and checked that out.

    Just terminally curious. (-:

      Very little - the proxy seems to add fractional seconds per message (see the logs while it's running) but it might "feel" slower in your mail client due to the batched-then-dumped nature of things. The proxy buffers up the whole message, then the client gets it all at once, rather than trickling in. You might see jumpy progress bars as a result sometimes.

      One of the biggest lag problems with a SpamAssassin scan happens when SA spins off into the weeds on truly large messages. This may be an acceptable delay in a store-and-forward chain, but in a real time network proxy it's fatal. Pop3Proxy works around that (like many SA apps) by declining to scan messages over a certain size - 250K by default, but it's configurable.

      See the discussion on the manpage for more on that.


Re: A SpamAssassin-Enabled POP3 Proxy
by guha (Priest) on Sep 04, 2002 at 15:48 UTC

    Great stuff McD!!

    Tried it a few days and I'm happy to say that my daily spam-quota has sunk to 3% of it's former level.

    Apart from that I must say that the manpage is very well written, in fact I think it's one of the best I've ever read. Informative, funny and well versed.

    PS. You can send the cheque to me by snail instead of email, I don't want SA to snarf it. ;-]

      guha writes:

      Tried it a few days and I'm happy to say that my daily spam-quota has sunk to 3% of it's former level.

      Thank SpamAssassin for that - the spam filtering is all about their hard work and dedication. All I do is get them the mail. :-)

      They just released 2.40 a day or two ago, and 2.41 will be close on it's heels with bugfixes it sounds like. I hope to update and qualify with that version over the coming weekend - some of the early stats on the 2.40 release's effectiveness are simply phenomenal.

      Apart from that I must say that the manpage is very well written, in fact I think it's one of the best I've ever read. Informative, funny and well versed.

      You flatter me, sir. :-)


Re: A SpamAssassin-Enabled POP3 Proxy
by Anonymous Monk on Nov 01, 2002 at 12:46 UTC
    This is great - I've been struggling with getting SpamAssasin to run on WinXp and this got it all going in just a few minutes! Just one question - I've installed with SpamAssassin 2.43 and my POP3Proxy log is complaining like so: configuration file "/etc/mail/spamassassin/" requires version 2.41 of SpamAssassin, but this is code version 2.43. Maybe you need to use the -c switch, or remove the old config files? Skipping this file at C:/Perl/site/lib/Mail/SpamAssassin/ line 246. I does the same for files: Is this something to worry about - or is there a fix? Thanks. BarryC.
      Barry C writes:

      This is great - I've been struggling with getting SpamAssasin to run on WinXp and this got it all going in just a few minutes!

      Glad it's working for you.

      Just one question - I've installed with SpamAssassin 2.43...

      I haven't tested against 2.43, but others have reported success.

      Based on the error message, it looks to me like you need to remove the old tests before upgrading to 2.43 - but I could be mistaken. Try zapping your entire pop3proxy directory and starting over - maybe save off your user_prefs and hostmap.txt (if any) first.