I run my own mailserver. Like everyone, I get spam. In the last months, the amount of spam I received has grown significantly, and I decided to set anti-spam up on my mail server.
So the classic advice here is to install AMaViS (A MAil VIrus Scanner), and hook it to the Postfix server. And so I did.
In my setup, I want spam to be tagged as such, but STILL to be delivered to me (so that I can route it to a spam folder via Sieve). And so I did.
But performances… Well, awful performances. My inbox was still full of untagged spam.
So I learn that Spamassassin has learning capabilities: if you feed it enough “spam” and “ham” emails, it’s going to learn what you consider spam and what you consider “ham” (“ham” mail being legitimate mail in the spam-fighting lingo, apparently).
Cool. And so I did.
I manually move spam mail to a “SPAM” as it gets delivered to me, and set up a cron-job to process it via sa-learn (the tool from the spamassassin toolbox) in order to “feed” it.
Cool. But I still get untagged spam. Actually, I get from the same amount to even more spam that before.
So I look deeper. I basically only set-up the cron-job, and did not set any path to any weird directory. If you’ve been running your own services enough you know that if you install a package and did not set any path to point to some weird directory then it’s very likely that whatever you installed it’s not going to work properly.
So I look deeper.
In that setup, e-mails get into Postfix first. Postfix performs some preliminary checks (is this mail for me? is DKIM okay? is SPF okay? is the user authenticated? does the destination user exists? if not, is there any chatchall address? you know, stuff like this) and then delivers the mail to the amavis content-filter via SMTP. Amavis has its own daemon, running on a certain port, speaking SMTP. So amavis will accept an email, queue it, do its stuff (note: amavis is going to run SpamAssassin at some point in here), and then… Deliver it again, to… Postfix. I now have another smtp daemon, running on another port, performing no check whatsoever because it is (supposedly) only accepting e-mails from amavis. Cool. A huge bump in complexity (I went from one to three daemons) but cool. And poorly working.
Now… How do you configure amavis ? Basically there is a (poorly documented) folder in /etc/amavis . Specifically, there is a file which regulates the content-filter operating mode (the one I’m using). The comments there only says to uncomment some lines to enable virus scanning and/or spam filtering.
Cool, but not what I am looking for. I am trainning spamassassing, and I need to know where is SpamAssassin looking for a database of bayesian stuff to use when processing emails.
Now, let’s clarify some things: SpamAssassin is written in Perl, and can thus work as both a software library or as a daemon itself, on its own.
Amavis is (if I got this correctly) running SpamAssassin as a library.
So… here comes the madness: I know what options I should feed to SpamAssassin to (supposedly) make it work. But amavis will run it for you, somehow, without letting you specify any options.
So I googled, and read the effing manual.
And I learn that /var/lib/amavis/.spamassassin is the directory where amavis will run spamassassin. So I put my database there.
And nothing happens.
So look better, and see that in /var/lib/amavis/.spamassassin there is a user_prefs file which really looks like a spamassassin configuration file, because it is a spamassassin configuration file. So I go ahead and alter it.
And nothing happens.
And in all this, there also are at least two other places for configuring something that goes under the name of spamassassin: /etc/default/spamassassin and /etc/mail/spamassassin.
So I googled a bit more and read the effing manual again, and learn some interesting stuff.
According to the amavisd-new FAQ on spam filtering, in fact:
- SA does observe all settings in its configuration file, but not all of them have effect on the mail being checked
- Options to control trigger levels for spam (tag/tag2/kill level) must be in amavisd.conf. But nowhere is documented how to configure the bayesian learning, or anything related.
- And other stuff.
Basically, amavis is a very opaque way to run spamassassin and clamav against your mail.
A simpler solution
A simpler solution involves running spamassassin alone. I won’t dive into the details here because they already well documented on the rest of the Internet. It is sufficient to say that I now pipe every email through a spamassassin session. This is hard on performance, but given the light load of my mail-server, I can withstand it.