A tale of software madness: Amavis and SpamAssassin

I run my own mailserver. Like everyone, I get spam. In the last months, the amount of spam I received has grown significantly, and I decided to set anti-spam up on my mail server.

So the classic advice here is to install AMaViS (A MAil VIrus Scanner), and hook it to the Postfix server. And so I did.

In my setup, I want spam to be tagged as such, but STILL to be delivered to me (so that I can route it to a spam folder via Sieve). And so I did.

But performances… Well, awful performances. My inbox was still full of untagged spam.

So I learn that Spamassassin has learning capabilities: if you feed it enough “spam” and “ham” emails, it’s going to learn what you consider spam and what you consider “ham” (“ham” mail being legitimate mail in the spam-fighting lingo, apparently).

Cool. And so I did.

I manually move spam mail to a “SPAM” as it gets delivered to me, and set up a cron-job to process it via sa-learn (the tool from the spamassassin toolbox) in order to “feed” it.

Cool. But I still get untagged spam. Actually, I get from the same amount to even more spam that before.

So I look deeper. I basically only set-up the cron-job, and did not set any path to any weird directory. If you’ve been running your own services enough you know that if you install a package and did not set any path to point to some weird directory then it’s very likely that whatever you installed it’s not going to work properly.

So I look deeper.

In that setup, e-mails get into Postfix first. Postfix performs some preliminary checks (is this mail for me? is DKIM okay? is SPF okay? is the user authenticated? does the destination user exists? if not, is there any chatchall address? you know, stuff like this) and then delivers the mail to the amavis content-filter via SMTP. Amavis has its own daemon, running on a certain port, speaking SMTP. So amavis will accept an email, queue it, do its stuff (note: amavis is going to run SpamAssassin at some point in here), and then… Deliver it again, to… Postfix. I now have another smtp daemon, running on another port, performing no check whatsoever because it is (supposedly) only accepting e-mails from amavis. Cool. A huge bump in complexity (I went from one to three daemons) but cool. And poorly working.

Now… How do you configure amavis ? Basically there is a (poorly documented) folder in /etc/amavis . Specifically, there is a file which regulates the content-filter operating mode (the one I’m using). The comments there only says to uncomment some lines to enable virus scanning and/or spam filtering.

Cool, but not what I am looking for. I am trainning spamassassing, and I need to know where is SpamAssassin looking for a database of bayesian stuff to use when processing emails.

Now, let’s clarify some things: SpamAssassin is written in Perl, and can thus work as both a software library or as a daemon itself, on its own.

Amavis is (if I got this correctly) running SpamAssassin as a library.

So… here comes the madness: I know what options I should feed to SpamAssassin to (supposedly) make it work. But amavis will run it for you, somehow, without letting you specify any options.

So I googled, and read the effing manual.

And I learn that /var/lib/amavis/.spamassassin is the directory where amavis will run spamassassin. So I put my database there.

And nothing happens.

So look better, and see that in /var/lib/amavis/.spamassassin there is a user_prefs file which really looks like a spamassassin configuration file, because it is a spamassassin configuration file. So I go ahead and alter it.

And nothing happens.

And in all this, there also are at least two other places for configuring something that goes under the name of spamassassin: /etc/default/spamassassin and /etc/mail/spamassassin.

So I googled a bit more and read the effing manual again, and learn some interesting stuff.

According to the amavisd-new FAQ on spam filtering, in fact:

  • SA does observe all settings in its configuration file, but not all of them have effect on the mail being checked
  • Options to control trigger levels for spam (tag/tag2/kill level) must be in amavisd.conf. But nowhere is documented how to configure the bayesian learning, or anything related.
  • And other stuff.

Basically, amavis is a very opaque way to run spamassassin and clamav against your mail.

A simpler solution

A simpler solution involves running spamassassin alone. I won’t dive into the details here because they already well documented on the rest of the Internet. It is sufficient to say that I now pipe every email through a spamassassin session. This is hard on performance, but given the light load of my mail-server, I can withstand it.

Building GNU Emacs from sources

I want to look at the GNU Emacs source code because I have some ideas I want to try and implement.

If you want to write patches for an open-source project, the first thing to do is to check out the latest version from the repository, make sure it compiles and runs. At this stage, you also want to make sure that all of the tests passes, so that you can start working on a verified assumption that you branched from a fully working commit.

So for short this post is a micro-tutorial on building GNU Emacs from source on a Debian-compatible (read: apt-based) system.

The first thing to do is to gather all the required build dependencies:

sudo apt-get build-dep emacs24

This alone will install all of the necessary building dependencies.

Next step is to check out the source code. You can find the url of the repository on the Savannah page:

git clone git clone -b master git://git.sv.gnu.org/emacs.git

At this page you might want to slow down a little and skim through the INSTALL file. It contains instructions for building.


At this point you’re al most ready for compilation. You have to run the auto-tools and generate the Makefile:

./autogen.sh

./autogen.sh git

And now you can generate a Makefile with all the features you want (you can look up some of the configure options in the INSTALL file, or run ./configure –help):

./configure --with-x-toolkit=gtk --with-cairo --with-modules

And now, finally, compile:

make -j9

On my machine (ThinkPad W530, quad-core i7 and 7200rpm rotational hard drive), it took about five minutes:

real	5m9.283s
user	21m21.412s
sys	0m48.284s

You will now find the newly built emacs binaries in the src/ directory.

Have fun!

I wrote a toy url shortener and people started using it

So yesterday I gave an introductory presentation on how to write web applications using Python, Flask and dokku. In my talk I shown how to go from zero to a fully functional application with a small but functional deployment environment based on Dokku.

The nice thing I noticed is that people started using it, and now I have some URLs shortened through my system.

The application is called musho, as in Micro Url SHOrtener. Here it is live, and you can find the source code on Gitlab.

I hope you like it🙂

Testing Go applications with PostgreSQL and docker

I spawn and terminate docker containers automatically with Jenkins.

This allowed me to have a lighter application and to keep my testing environment closer to the live environment.

In this post I describe how I did this.

Continue reading “Testing Go applications with PostgreSQL and docker”

Correctly erasing an SSD on GNU/Linux

This is not really an article but more of a note for future reference.

So i decided to wipe an SSD drive, and I assumed that using the good old dd would not have been the correct choice.

I was right, and someone (thanks mingdao) from #gentoo on Freenode directed me to this page: ATA Secure Erase.

Basically, when dealing with SSD, it’s better to rely on standard erase feature than doing it yourself.

It took very little (approximately 21 seconds).

Gin-gonic improvements

While writing simple web applications using gin-gonic web framework, there are some things that should be improved, in my opinion:

Ideas

  1. Static file serving: it should be improved to use browser caching, according to https://developers.google.com/speed/docs/insights/LeverageBrowserCaching
  2. Default values for templates rendering: there are some values that I am going to need in pretty much every template i am rendering. It would be nice to be able to set them once and for all so that when I render a template, the values i want to be always present are added to the ones i am passing on the fly.

Continue reading “Gin-gonic improvements”

WordPress, again.

I am back on using Wordpress.

Long story short, I am giving WordPress another try.

The main reason is that most static blog engines are awful under the usability point of view, and I really need a blog: when I do stuff I would really like to take notes but every time is a mess like “How did octopress/jekyll implement this feature? What was that?” and such.

As my blog is not a goal but a tool to me, I want something that JustWorks. Thus, I’m going for WordPress, hosted on WordPress.com: no hassle, no maintenance, just features.

Hopefully.