Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

dialup spam removal with Net::POP3

by zentara (Archbishop)
on May 29, 2003 at 18:25 UTC ( #261632=snippet: print w/replies, xml ) Need Help??
Description: Well, we are all trying to filter spam. SpamAssasin gets high marks, but it requires you to download the mail first. I tried popsneaker(c-program), and it works great on headers, but misses the body, which isn't good enough anymore. So... I decided to try Net::POP3, which can get the headers and head a number of body lines. This is just my first working model, and I know it uses "goto's", but it was just the easiest way to break out of nested loops. The patterns are stored in files, to make it easy to just add them as necessary. It works quite fast on my dialup. To test, just run, and observe the output. For real deletion, uncomment the pop3->delete line near the bottom of the code.

The files at the end are where you put the regex patterns, about the only thing to watch for is backslashing '.'

It runs fast on my dialup line. I just run it before I download my mail with my normal mail client.

Update: I have changed the original post, to a slightly improved version. Aristotle's methods below, are superior to mine, but I think my "clunky method" is a little easier for a beginner to see what is happening and modify. I did eliminate the redundant "regex creation from file" code.

Update2 I've taken Aristotle's recommendations and implemented a greping method to avoid redundant line checks. I also have split the message into 2 arrays, @header and @top. Additionally, I fixed the exit routine so that there always is a logout from the pop server. The only thing that this code may miss, is if the various header lines have a newline in them, but I'm finding the first line of each header is good enough.

#!/usr/bin/perl -w
use strict;
use Net::POP3;

my $maxsize = 50000;
my (@fromgoodre,@frombadre,@badwordre,@maillistre,@badcontentre,@recei
+vedgoodre,@togoodre);

my %rehash = (
   'fromgood' => \@fromgoodre,
   'frombad' => \@frombadre,
   'badwords' => \@badwordre,
   'maillist' => \@maillistre,
   'contentbad' => \@badcontentre,
   'receivedgood' => \@receivedgoodre,
   'togood' => \@togoodre
);

foreach my $file (keys %rehash){
  open(FH,"< $file") or warn "Can't open $file $!\n";
  chomp (my @lines = <FH>);
     foreach my $line(@lines){push(@{$rehash{$file}},qr/$line/i)}
close FH;
}


my $ServerName = "mail.foobar.com";
my $pop3 = Net::POP3->new($ServerName)||die("Couldn't log on to server
+\n");
my $UserName = "zentara";
my $Password = "spam4lunch";

my $num_messages = $pop3->login($UserName, $Password)||die("Bad userna
+me or password\n");
my $messages = $pop3->list();
print "******$messages Number of messages->$num_messages*******\n";
my %messages;
my ($flag,$msg_id,$messcheck,@del_mess,@header,@top,$line);
@del_mess = ();
$messcheck = '';
print "####################################################\n";

foreach $msg_id(keys %{$messages}) {
     $flag = '';
     my @header =(); #array for headerlines 
     my @top = ();  #array for top of messagebody 
     my $messref = $pop3->top($msg_id,10);
     my $size = $pop3->list($msg_id);
     print "message$msg_id->size=$size\n";
     print "----------------------------------\n";

#split into @header and @top
     while(1){
       my $line = shift @$messref;
       last if $line =~ /^\s*$/;
       push(@header,$line);
     }
     @top = @$messref;
     print '~~~~~~~~~~~~~~~~~~@header ->',"\n@header\n";
     print '~~~~~~~~~~~~~~~~~~@top ->',"\n@top\n";


#check X-Mailinglist: 
          if(($line) = grep(/^X-Mailinglist:/o,@header)){
                 print "#################X-Mailinglistline-> $line\n";
                 for my $maillistre (@maillistre) {
                      if ($line =~ /$maillistre/) { print "Mail list g
+ood\n"; $flag = 'ok'; goto DOBODY}
                   }
          }
#check From 
           ($line) = grep(/^From:/io,@header);
           print "#################Fromgrepline-> $line\n";
           my $from = $line;
                for my $goodfromre (@fromgoodre) {
                      if ($line =~ /$goodfromre/) { print "GoodFrom->$
+goodfromre\n"; $flag = 'ok'; goto DOBODY}
                  }
                  for my $badfromre (@frombadre) {
                      if ($line =~ /$badfromre/) { print "BadFrom->$ba
+dfromre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id -> $badfr
+omre ->$from\n"; $flag = 'bad'; goto DOBODY}
                  }

#check To and CC 
            ($line) = grep(/^(To|Cc|cc|Return-Path):/o,@header);
            print "#################To-CC-ReturnPathgrepline-> $line\n
+";
                 for my $togoodre (@togoodre) {
                      if ($line =~ /$togoodre/) { print "ToCCReturn-Pa
+th address good\n"; $flag = 'ok'; goto DOBODY}
                   }

#check Received         
            ($line) = grep(/^Received:/o,@header);
            print "#################Receivedgrepline-> $line\n";
                 for my $receivedgoodre (@receivedgoodre) {
                      if ($line =~ /$receivedgoodre/) { print "Receive
+d address good\n"; $flag = 'ok'; goto DOBODY}
                   }

#check Content-type 
            ($line) = grep(/^Content-Type:/o,@header);
            print "#################Content-typegrepline-> $line\n";
                 for my $badcontentre (@badcontentre) {
                      if ($line =~ /$badcontentre/) { print "Bad Conte
+nt->$badcontentre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id
+ -> $badcontentre ->$from\n"; $flag = 'bad'; goto DOBODY}
                   }

#check Subject       
            ($line) = grep(/^Subject:/o,@header);
            print "#################Subjectgrepline-> $line\n";
                 for my $badwordre (@badwordre) {
                      if ($line =~ /$badwordre/) { print "badword in s
+ubject->$badwordre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_i
+d -> $badwordre ->$from\n"; $flag = 'bad'; goto DOBODY}
                  }

#check for base64                    
        if(($line) = grep(/^Content-Transfer-Encoding: base64/o,@heade
+r)){
            print "#################base64grepline-> $line\n";
            if ($line =~ /^Content-Transfer-Encoding: base64/o){print 
+"base64 content\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id -
+> base64 ->$from\n"; $flag = 'bad';goto DOBODY}
         }

#check Message-Id 
            ($line) = grep(/^Message-[Ii][Dd]:/o,@header);
            print "#################Message-Idgrepline-> $line\n";
            if (($line !~ /.*<?.+@.+(\..+)?>?$/o) || ($line =~ /\@127\
+.0\.0\.1/o))
             { print "Bad Message-Id\n"; push(@del_mess,$msg_id); $mes
+scheck .= "$msg_id -> bad Message-Id ->$from\n"; $flag = 'bad'; goto 
+DOBODY}

DOBODY:
if ($flag eq 'ok'){print "Message $msg_id is OK\n"; line(); next}
if ($flag eq 'bad'){print "Message $msg_id is bad\n"; delmessage($msg_
+id); line(); next}
if ($size > $maxsize){print "Size limit exceeded\n"; push(@del_mess,$m
+sg_id); $messcheck .= "$msg_id -> size limit exceeded ->$from\n"; $fl
+ag = 'bad'; goto DOBODY}
#do body check 
print "Doing body check\n";
    for my $line (@top){
         for my $badwordre (@badwordre) {
         if ($line =~ /$badwordre/) { print "badword in body->$badword
+re\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id -> bad word in
+ body ->$badwordre ->$from\n"; delmessage($msg_id); line(); goto FINI
+SH}
       }
    }
FINISH:
line();
print "\n";
}
#confirm the deletions 
line();line();line();line();
unless(defined $del_mess[0]){print "No messages to be deleted\nHit Ent
+er to quit\n"; my $input = <>; $pop3->quit(); exit}
print "$messcheck\n\n";
print "Delete above messages?\n  [yY] then Enter to delete, or Enter t
+o abort\n";
chomp(my $input = <>);
if($input =~ /^[yY]$/i){foreach(@del_mess){print "Deleting message $_\
+n"; $pop3->delete($_)}}

$pop3->quit();

sub delmessage{print "Message @_ to be deleted\n"}
sub line {print "####################################################\
+n"}
exit 0;
__END__

SAMPLE FILES:
###badwords######
jackpot
fee
visit
information
confident
confidential
urgent
action
please
notice
cheaper
virus
travel
nigeria
gabon
africa
afrique
\$\$\$
\*\*
\*
PROFITS
debt
INVEST
MONEY
CREDIT
FREE
CASH
BUY
\. \. \.
\.\.\.
party
penis
sex
viagra
prescription
###########################################

####contentbad#############################
text\/html
multipart\/alternative
multipart\/mixed
###########################################

######frombad##############################
.*\.ac>?$
.*\.ae>?$
.*\.af>?$
.*\.ag>?$
.*\.ai>?$
.*\.al>?$
.*\.am>?$
.*\.an>?$
.*\.ao>?$
.*\.aq>?$
.*\.ar>?$
.*\.as>?$
.*\.at>?$
.*\.by>?$
.*\.bz>?$
.*\.ca>?$
.*\.cc>?$
.*\.cd>?$
.*\.cf>?$
.*\.cg>?$
.*\.ch>?$
.*\.ci>?$
.*\.ck>?$
.*\.cl>?$
.*\.cm>?$
.*\.cn>?$
.*\.co>?$
.*\.cr>?$
.*\.cu>?$
.*\.cv>?$
.*\.cx>?$
#############################################

####fromgood#################################
mailgeek\.compgeeks\.com
zmuato
sbiblert
mcubase
mymother
myrelatives
##############################################

#########maillist###########################
suse-linux-e
#############################################

#################receivedgood#################
list2\.suse\.com
#############################################

#########togood##############################
sdlperl@sdlperl\.org
mc@gnome\.org
##############################################
Replies are listed 'Best First'.
Re: dialup spam removal with Net::POP3 (program repair shop)
by Aristotle (Chancellor) on May 31, 2003 at 17:21 UTC

    Note: written before parent node was updated.

    Good idea, but far less than enough hubris. You can replace the entire repetitive beginning of the script by just a few lines:

    sub make_rx_from { open my $fh, "<", shift or warn("Can't open fromgood $!\n"), retur +n; chomp(my @line = <$fh>); return map qr/$_/, @line; } my @togood = make_rx_from "togood"; my @receivedgood = make_rx_from "receivedgood"; my @badcontent = make_rx_from "contentbad"; my @maillists = make_rx_from "maillist"; my @badwords = make_rx_from "badwords"; my @frombad = make_rx_from "frombad"; my @fromgood = make_rx_from "fromgood";
    But that's still far from ideal. Look at your code's main loop: you are examining a specific header, testing it against a whitelist and a blacklist, flagging accordingly. The same pattern every time. That can be factored out along these lines:
    HEADER: foreach my $line (@$messref) { # detects first blank line go do body check of first 10 body lines last if $line =~ /^\s*$/; my ($header) = grep $line =~ /^$_:/i, keys %test_for; next unless $header; for my $test (qw(ok bad)) { my @match = grep $line =~ $_, @{ $test_for{$header}{$test} }; next if not @match; $flag = $test; print "Message $test - header $header matched: @match\n"; last HEADER; } }
    You could also check the message size prior to any other tests for efficiency. I'll post the complete code as I refactored it in a separate node, for easier downloading.

    Makeshifts last the longest.

      Thanks Aristotle. You always can be counted on to improve someone's code. :-) I put the size check at the end, because I wanted to let anyone on the "goodlist" to send me anything, so I had to check the headers first. I've been testing my version for the past few days on pop servers with alot of spam, I'm getting pretty happy with it. The only problem I'm running into is that the headers always don't come in any well defined order, and sometimes the From: dosn't get checked until some other line triggers a bad flag. I think your idea to grep for lines, can solve that.
Re: dialup spam removal with Net::POP3 (code attached here)
by Aristotle (Chancellor) on May 31, 2003 at 18:21 UTC
    This is futher restructured compared to my points made above:
    #!/usr/bin/perl -w use strict; use Net::POP3; use constant HOST => "pop.foo.net"; use constant USER => "bar"; use constant PASS => "baz"; use constant MAXSIZE => 50000; sub make_rx_from_file { my $fn = shift; open my $fh, "<", $fn or warn("Can't open $fn: $!\n"), return []; chomp(my @line = <$fh>); return [ map qr/$_/, @line ]; } my @bad_word = @{make_rx_from_file "badwords"}; my %test_for = ( 'To' => { ok => make_rx_from_file "togood", }, 'Received' => { ok => make_rx_from_file "receivedgood", }, 'Content-Type' => { bad => make_rx_from_file "contentbad", }, 'Subject' => { bad => \@bad_word, }, 'X-Mailinglist' => { ok => make_rx_from_file "maillist", }, 'From' => { ok => make_rx_from_file "fromgood", bad => make_rx_from_file "frombad", }, 'Content-Transfer-Encoding:' => { bad => [ qr/base64/, ], }, 'Message-ID' => { bad => [ qr/^(?>[^<>]+)<(?>[^<>]+)\@(?>[^<>]+)>(?>[^<>]*)$/, qr/\@127\.0\.0\.1/, ] }, ); my $pop3 = Net::POP3->new(HOST) || die ("Couldn't connect to server\n" +); my $num_messages = $pop3->apop(USER, PASS) || die ("Bad username or pa +ssword\n"); my $messages = $pop3->list(); print "Number of messages: $num_messages\n"; sub MSG_BAD; sub MSG_OK; CHECK: foreach my $msg_id (keys %$messages) { my @messref; *MSG_OK = sub { print shift, "\n", map "> $_", @messref; print "Message $msg_id is ok\n\n"; local $^W; next CHECK; }; *MSG_BAD = sub { print shift, "\n", map "> $_", @messref; print "Message $msg_id is bad\n"; # uncomment the following lines for real deletes #$pop3->delete($msg_id); #print "Message $msg_id deleted\n"; print "\n"; local $^W; next CHECK; }; my $size = $pop3->list($msg_id); print "Message $msg_id ($size bytes):\n"; MSG_BAD "Size limit exceeded\n" if $size > MAXSIZE; my $in_header = 1; @messref = @{$pop3->top($msg_id, 10)}; while(defined(my $line = shift @messref)) { print "> $line"; if($in_header and $line =~ /^\s*$/) { $in_header = 0; print "Checking body\n"; next; } my @match; if($in_header) { my ($header) = grep $line =~ /^$_:/i, keys %test_for; next unless $header; my $tests = $test_for{$header}; if(exists $tests->{ok} and @{ $tests->{ok} }) { @match = grep $line =~ $_, @{ $tests->{ok} }; MSG_OK "Header $header matched: @match\n" if @match; } if(exists $tests->{bad} and @{ $tests->{bad} }) { @match = grep $line =~ $_, @{ $tests->{bad} }; MSG_BAD "Header $header matched: @match\n" if @match; } } else { @match = grep $line =~ $_, @bad_word; MSG_BAD "Bad word(s) matched in body: @match\n" if @match; } } MSG_OK "No tests triggered."; } $pop3->quit();

    Makeshifts last the longest.

Re: dialup spam removal with Net::POP3
by zentara (Archbishop) on Jun 02, 2003 at 16:48 UTC
    Well Thank you Aristotle for making me think. I have rewritten my code with your suggestions in mind, but I did them more in a way I can deal with. Maybe someday I'll have your "perlbrain". :-) spam removal with Net::POP3
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://261632]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2022-11-30 10:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?