http://qs1969.pair.com?node_id=261632

Well, we are all trying to filter spam. SpamAssasin gets high marks, but it requires you to download the mail first. I tried popsneaker(c-program), and it works great on headers, but misses the body, which isn't good enough anymore. So... I decided to try Net::POP3, which can get the headers and head a number of body lines. This is just my first working model, and I know it uses "goto's", but it was just the easiest way to break out of nested loops. The patterns are stored in files, to make it easy to just add them as necessary. It works quite fast on my dialup. To test, just run, and observe the output. For real deletion, uncomment the pop3->delete line near the bottom of the code.

The files at the end are where you put the regex patterns, about the only thing to watch for is backslashing '.'

It runs fast on my dialup line. I just run it before I download my mail with my normal mail client.

Update: I have changed the original post, to a slightly improved version. Aristotle's methods below, are superior to mine, but I think my "clunky method" is a little easier for a beginner to see what is happening and modify. I did eliminate the redundant "regex creation from file" code.

Update2 I've taken Aristotle's recommendations and implemented a greping method to avoid redundant line checks. I also have split the message into 2 arrays, @header and @top. Additionally, I fixed the exit routine so that there always is a logout from the pop server. The only thing that this code may miss, is if the various header lines have a newline in them, but I'm finding the first line of each header is good enough.

#!/usr/bin/perl -w use strict; use Net::POP3; my $maxsize = 50000; my (@fromgoodre,@frombadre,@badwordre,@maillistre,@badcontentre,@recei +vedgoodre,@togoodre); my %rehash = ( 'fromgood' => \@fromgoodre, 'frombad' => \@frombadre, 'badwords' => \@badwordre, 'maillist' => \@maillistre, 'contentbad' => \@badcontentre, 'receivedgood' => \@receivedgoodre, 'togood' => \@togoodre ); foreach my $file (keys %rehash){ open(FH,"< $file") or warn "Can't open $file $!\n"; chomp (my @lines = <FH>); foreach my $line(@lines){push(@{$rehash{$file}},qr/$line/i)} close FH; } my $ServerName = "mail.foobar.com"; my $pop3 = Net::POP3->new($ServerName)||die("Couldn't log on to server +\n"); my $UserName = "zentara"; my $Password = "spam4lunch"; my $num_messages = $pop3->login($UserName, $Password)||die("Bad userna +me or password\n"); my $messages = $pop3->list(); print "******$messages Number of messages->$num_messages*******\n"; my %messages; my ($flag,$msg_id,$messcheck,@del_mess,@header,@top,$line); @del_mess = (); $messcheck = ''; print "####################################################\n"; foreach $msg_id(keys %{$messages}) { $flag = ''; my @header =(); #array for headerlines my @top = (); #array for top of messagebody my $messref = $pop3->top($msg_id,10); my $size = $pop3->list($msg_id); print "message$msg_id->size=$size\n"; print "----------------------------------\n"; #split into @header and @top while(1){ my $line = shift @$messref; last if $line =~ /^\s*$/; push(@header,$line); } @top = @$messref; print '~~~~~~~~~~~~~~~~~~@header ->',"\n@header\n"; print '~~~~~~~~~~~~~~~~~~@top ->',"\n@top\n"; #check X-Mailinglist: if(($line) = grep(/^X-Mailinglist:/o,@header)){ print "#################X-Mailinglistline-> $line\n"; for my $maillistre (@maillistre) { if ($line =~ /$maillistre/) { print "Mail list g +ood\n"; $flag = 'ok'; goto DOBODY} } } #check From ($line) = grep(/^From:/io,@header); print "#################Fromgrepline-> $line\n"; my $from = $line; for my $goodfromre (@fromgoodre) { if ($line =~ /$goodfromre/) { print "GoodFrom->$ +goodfromre\n"; $flag = 'ok'; goto DOBODY} } for my $badfromre (@frombadre) { if ($line =~ /$badfromre/) { print "BadFrom->$ba +dfromre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id -> $badfr +omre ->$from\n"; $flag = 'bad'; goto DOBODY} } #check To and CC ($line) = grep(/^(To|Cc|cc|Return-Path):/o,@header); print "#################To-CC-ReturnPathgrepline-> $line\n +"; for my $togoodre (@togoodre) { if ($line =~ /$togoodre/) { print "ToCCReturn-Pa +th address good\n"; $flag = 'ok'; goto DOBODY} } #check Received ($line) = grep(/^Received:/o,@header); print "#################Receivedgrepline-> $line\n"; for my $receivedgoodre (@receivedgoodre) { if ($line =~ /$receivedgoodre/) { print "Receive +d address good\n"; $flag = 'ok'; goto DOBODY} } #check Content-type ($line) = grep(/^Content-Type:/o,@header); print "#################Content-typegrepline-> $line\n"; for my $badcontentre (@badcontentre) { if ($line =~ /$badcontentre/) { print "Bad Conte +nt->$badcontentre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id + -> $badcontentre ->$from\n"; $flag = 'bad'; goto DOBODY} } #check Subject ($line) = grep(/^Subject:/o,@header); print "#################Subjectgrepline-> $line\n"; for my $badwordre (@badwordre) { if ($line =~ /$badwordre/) { print "badword in s +ubject->$badwordre\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_i +d -> $badwordre ->$from\n"; $flag = 'bad'; goto DOBODY} } #check for base64 if(($line) = grep(/^Content-Transfer-Encoding: base64/o,@heade +r)){ print "#################base64grepline-> $line\n"; if ($line =~ /^Content-Transfer-Encoding: base64/o){print +"base64 content\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id - +> base64 ->$from\n"; $flag = 'bad';goto DOBODY} } #check Message-Id ($line) = grep(/^Message-[Ii][Dd]:/o,@header); print "#################Message-Idgrepline-> $line\n"; if (($line !~ /.*<?.+@.+(\..+)?>?$/o) || ($line =~ /\@127\ +.0\.0\.1/o)) { print "Bad Message-Id\n"; push(@del_mess,$msg_id); $mes +scheck .= "$msg_id -> bad Message-Id ->$from\n"; $flag = 'bad'; goto +DOBODY} DOBODY: if ($flag eq 'ok'){print "Message $msg_id is OK\n"; line(); next} if ($flag eq 'bad'){print "Message $msg_id is bad\n"; delmessage($msg_ +id); line(); next} if ($size > $maxsize){print "Size limit exceeded\n"; push(@del_mess,$m +sg_id); $messcheck .= "$msg_id -> size limit exceeded ->$from\n"; $fl +ag = 'bad'; goto DOBODY} #do body check print "Doing body check\n"; for my $line (@top){ for my $badwordre (@badwordre) { if ($line =~ /$badwordre/) { print "badword in body->$badword +re\n"; push(@del_mess,$msg_id); $messcheck .= "$msg_id -> bad word in + body ->$badwordre ->$from\n"; delmessage($msg_id); line(); goto FINI +SH} } } FINISH: line(); print "\n"; } #confirm the deletions line();line();line();line(); unless(defined $del_mess[0]){print "No messages to be deleted\nHit Ent +er to quit\n"; my $input = <>; $pop3->quit(); exit} print "$messcheck\n\n"; print "Delete above messages?\n [yY] then Enter to delete, or Enter t +o abort\n"; chomp(my $input = <>); if($input =~ /^[yY]$/i){foreach(@del_mess){print "Deleting message $_\ +n"; $pop3->delete($_)}} $pop3->quit(); sub delmessage{print "Message @_ to be deleted\n"} sub line {print "####################################################\ +n"} exit 0; __END__ SAMPLE FILES: ###badwords###### jackpot fee visit information confident confidential urgent action please notice cheaper virus travel nigeria gabon africa afrique \$\$\$ \*\* \* PROFITS debt INVEST MONEY CREDIT FREE CASH BUY \. \. \. \.\.\. party penis sex viagra prescription ########################################### ####contentbad############################# text\/html multipart\/alternative multipart\/mixed ########################################### ######frombad############################## .*\.ac>?$ .*\.ae>?$ .*\.af>?$ .*\.ag>?$ .*\.ai>?$ .*\.al>?$ .*\.am>?$ .*\.an>?$ .*\.ao>?$ .*\.aq>?$ .*\.ar>?$ .*\.as>?$ .*\.at>?$ .*\.by>?$ .*\.bz>?$ .*\.ca>?$ .*\.cc>?$ .*\.cd>?$ .*\.cf>?$ .*\.cg>?$ .*\.ch>?$ .*\.ci>?$ .*\.ck>?$ .*\.cl>?$ .*\.cm>?$ .*\.cn>?$ .*\.co>?$ .*\.cr>?$ .*\.cu>?$ .*\.cv>?$ .*\.cx>?$ ############################################# ####fromgood################################# mailgeek\.compgeeks\.com zmuato sbiblert mcubase mymother myrelatives ############################################## #########maillist########################### suse-linux-e ############################################# #################receivedgood################# list2\.suse\.com ############################################# #########togood############################## sdlperl@sdlperl\.org mc@gnome\.org ##############################################

Replies are listed 'Best First'.
Re: dialup spam removal with Net::POP3 (program repair shop)
by Aristotle (Chancellor) on May 31, 2003 at 17:21 UTC

    Note: written before parent node was updated.

    Good idea, but far less than enough hubris. You can replace the entire repetitive beginning of the script by just a few lines:

    sub make_rx_from { open my $fh, "<", shift or warn("Can't open fromgood $!\n"), retur +n; chomp(my @line = <$fh>); return map qr/$_/, @line; } my @togood = make_rx_from "togood"; my @receivedgood = make_rx_from "receivedgood"; my @badcontent = make_rx_from "contentbad"; my @maillists = make_rx_from "maillist"; my @badwords = make_rx_from "badwords"; my @frombad = make_rx_from "frombad"; my @fromgood = make_rx_from "fromgood";
    But that's still far from ideal. Look at your code's main loop: you are examining a specific header, testing it against a whitelist and a blacklist, flagging accordingly. The same pattern every time. That can be factored out along these lines:
    HEADER: foreach my $line (@$messref) { # detects first blank line go do body check of first 10 body lines last if $line =~ /^\s*$/; my ($header) = grep $line =~ /^$_:/i, keys %test_for; next unless $header; for my $test (qw(ok bad)) { my @match = grep $line =~ $_, @{ $test_for{$header}{$test} }; next if not @match; $flag = $test; print "Message $test - header $header matched: @match\n"; last HEADER; } }
    You could also check the message size prior to any other tests for efficiency. I'll post the complete code as I refactored it in a separate node, for easier downloading.

    Makeshifts last the longest.

      Thanks Aristotle. You always can be counted on to improve someone's code. :-) I put the size check at the end, because I wanted to let anyone on the "goodlist" to send me anything, so I had to check the headers first. I've been testing my version for the past few days on pop servers with alot of spam, I'm getting pretty happy with it. The only problem I'm running into is that the headers always don't come in any well defined order, and sometimes the From: dosn't get checked until some other line triggers a bad flag. I think your idea to grep for lines, can solve that.
Re: dialup spam removal with Net::POP3 (code attached here)
by Aristotle (Chancellor) on May 31, 2003 at 18:21 UTC
    This is futher restructured compared to my points made above:
    #!/usr/bin/perl -w use strict; use Net::POP3; use constant HOST => "pop.foo.net"; use constant USER => "bar"; use constant PASS => "baz"; use constant MAXSIZE => 50000; sub make_rx_from_file { my $fn = shift; open my $fh, "<", $fn or warn("Can't open $fn: $!\n"), return []; chomp(my @line = <$fh>); return [ map qr/$_/, @line ]; } my @bad_word = @{make_rx_from_file "badwords"}; my %test_for = ( 'To' => { ok => make_rx_from_file "togood", }, 'Received' => { ok => make_rx_from_file "receivedgood", }, 'Content-Type' => { bad => make_rx_from_file "contentbad", }, 'Subject' => { bad => \@bad_word, }, 'X-Mailinglist' => { ok => make_rx_from_file "maillist", }, 'From' => { ok => make_rx_from_file "fromgood", bad => make_rx_from_file "frombad", }, 'Content-Transfer-Encoding:' => { bad => [ qr/base64/, ], }, 'Message-ID' => { bad => [ qr/^(?>[^<>]+)<(?>[^<>]+)\@(?>[^<>]+)>(?>[^<>]*)$/, qr/\@127\.0\.0\.1/, ] }, ); my $pop3 = Net::POP3->new(HOST) || die ("Couldn't connect to server\n" +); my $num_messages = $pop3->apop(USER, PASS) || die ("Bad username or pa +ssword\n"); my $messages = $pop3->list(); print "Number of messages: $num_messages\n"; sub MSG_BAD; sub MSG_OK; CHECK: foreach my $msg_id (keys %$messages) { my @messref; *MSG_OK = sub { print shift, "\n", map "> $_", @messref; print "Message $msg_id is ok\n\n"; local $^W; next CHECK; }; *MSG_BAD = sub { print shift, "\n", map "> $_", @messref; print "Message $msg_id is bad\n"; # uncomment the following lines for real deletes #$pop3->delete($msg_id); #print "Message $msg_id deleted\n"; print "\n"; local $^W; next CHECK; }; my $size = $pop3->list($msg_id); print "Message $msg_id ($size bytes):\n"; MSG_BAD "Size limit exceeded\n" if $size > MAXSIZE; my $in_header = 1; @messref = @{$pop3->top($msg_id, 10)}; while(defined(my $line = shift @messref)) { print "> $line"; if($in_header and $line =~ /^\s*$/) { $in_header = 0; print "Checking body\n"; next; } my @match; if($in_header) { my ($header) = grep $line =~ /^$_:/i, keys %test_for; next unless $header; my $tests = $test_for{$header}; if(exists $tests->{ok} and @{ $tests->{ok} }) { @match = grep $line =~ $_, @{ $tests->{ok} }; MSG_OK "Header $header matched: @match\n" if @match; } if(exists $tests->{bad} and @{ $tests->{bad} }) { @match = grep $line =~ $_, @{ $tests->{bad} }; MSG_BAD "Header $header matched: @match\n" if @match; } } else { @match = grep $line =~ $_, @bad_word; MSG_BAD "Bad word(s) matched in body: @match\n" if @match; } } MSG_OK "No tests triggered."; } $pop3->quit();

    Makeshifts last the longest.

Re: dialup spam removal with Net::POP3
by zentara (Archbishop) on Jun 02, 2003 at 16:48 UTC
    Well Thank you Aristotle for making me think. I have rewritten my code with your suggestions in mind, but I did them more in a way I can deal with. Maybe someday I'll have your "perlbrain". :-) spam removal with Net::POP3