Re: Help with setting up spamc

I need to send $message_received to spamc and capture its output in a variable (preferably) so I can get the spam score. I know I can just back quote a system command to capture stdout to a variable, but how can I do both the stdout and the stdin handling here? This should be simple, but I am just missing it...

The somewhat low-level approach in perl would be:

my $pid = open(CHLD, "-|");
die "Failed to fork: $!\n" unless defined $pid;
if($pid == 0)
{
      die "Failed to run spamc: $!" unless open(PROC, "|spamc");
      print PROC "My Arguments";
      close(PROC);
      exit(0);
}
while(<CHLD>)
{
    # collect the input
}
close(CHLD);
[download]

The above assumes, that spamc writes to STDOUT all output and simply exits. The approach is slow, because there are 2 forks involved. I don't know anything about SpamAssassin, but if you have spamd (daemon), then there should be some network protocol for talking to that daemon. If your program did the talking directly, then you'd save time for making forks.

Another point. Looks like SpamAssassin is slow in working even without forks. So, your best bet would be processing multiple emails in parallel.

Comment on Re: Help with setting up spamc Download Code

Replies are listed 'Best First'.
Re^2: Help with setting up spamc by SteveTheTechie (Novice) on Jul 10, 2014 at 00:15 UTC
Ok, this looks very interesting. Have not fiddled with tee and forks for a while. I did not think of communicating with spamd directly--thought I had to use the spamc cmd line interface. Thanks!	[reply]
Re^3: Help with setting up spamc by andal (Hermit) on Jul 10, 2014 at 07:20 UTC
As I said, I don't know much about SpamAssassin. But quick search brought up Mail::SpamAssassin::Client which implements protocol for talking to spamd. Note, the page for Mail::SpamAssassin says If you wish to use a command-line filter tool, try the spamassassin or the spamd/spamc tools provided So, I would believe, that these tools are good only when you have to use external commands, for example when you program not in perl, but in shell. In general, to increase throughput, you should make processing of each message independent as much as possible, so that one message does not have to wait for another. That usually means, that each message handler should run either in separate process, or in separate thread. Using separate spamd is of help only because it internally uses multiple processes/threads to handle messages. But if you feed your messages one by one, then the benefit is lost. And opposite, if you handle your messages in separate processes/threads, it does not make sense to move spamassassin into separate process, because you just add extra overhead of communicating with that process.	[reply]


No such thing as a small change
	PerlMonks