Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: dialup spam removal with Net::POP3 (program repair shop)

by Aristotle (Chancellor)
on May 31, 2003 at 17:21 UTC ( [id://262091]=note: print w/replies, xml ) Need Help??


in reply to dialup spam removal with Net::POP3

Note: written before parent node was updated.

Good idea, but far less than enough hubris. You can replace the entire repetitive beginning of the script by just a few lines:

sub make_rx_from { open my $fh, "<", shift or warn("Can't open fromgood $!\n"), retur +n; chomp(my @line = <$fh>); return map qr/$_/, @line; } my @togood = make_rx_from "togood"; my @receivedgood = make_rx_from "receivedgood"; my @badcontent = make_rx_from "contentbad"; my @maillists = make_rx_from "maillist"; my @badwords = make_rx_from "badwords"; my @frombad = make_rx_from "frombad"; my @fromgood = make_rx_from "fromgood";
But that's still far from ideal. Look at your code's main loop: you are examining a specific header, testing it against a whitelist and a blacklist, flagging accordingly. The same pattern every time. That can be factored out along these lines:
HEADER: foreach my $line (@$messref) { # detects first blank line go do body check of first 10 body lines last if $line =~ /^\s*$/; my ($header) = grep $line =~ /^$_:/i, keys %test_for; next unless $header; for my $test (qw(ok bad)) { my @match = grep $line =~ $_, @{ $test_for{$header}{$test} }; next if not @match; $flag = $test; print "Message $test - header $header matched: @match\n"; last HEADER; } }
You could also check the message size prior to any other tests for efficiency. I'll post the complete code as I refactored it in a separate node, for easier downloading.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: dialup spam removal with Net::POP3
by zentara (Archbishop) on Jun 01, 2003 at 15:21 UTC
    Thanks Aristotle. You always can be counted on to improve someone's code. :-) I put the size check at the end, because I wanted to let anyone on the "goodlist" to send me anything, so I had to check the headers first. I've been testing my version for the past few days on pop servers with alot of spam, I'm getting pretty happy with it. The only problem I'm running into is that the headers always don't come in any well defined order, and sometimes the From: dosn't get checked until some other line triggers a bad flag. I think your idea to grep for lines, can solve that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://262091]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-16 17:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found