It is nice when you work for a company where you can say:
"Ice-lolly break..."
The response?
"Me too!"
Tonight has been a productive evening, I guess the ice-lolly helped!
I managed to optimize the storage of rejected SPAM mail for my commercial service. That is something I've been obsessing over recently since the volume of SPAM is currently hovering around 2.5 million messages.
Still I suspect it is only a matter of weeks before I need to expand. The current setup has me using three machines:
- Primary machine runs:
- Web Application
- SMTP processing/filtering/delivery
- Secondary machine runs:
- SMTP processing/filtering/delivery
- Offsite machine:
- Runs the blog.
- Runs the support system.
- Runs the service status page
Ideally I'd like to split that up further so that I have a single machine running the web application (the part the user interacts with), a pair of MX machines, and the offsite machine doing the minimal work it does.
That way the incoming mail will not affect the application at all directly.
Thankfully the split should be trivial. The only hard part is finding a fast webhost that can offer me ~1Gb of RAM, ~1000Gb of disk space, and won't charge much. Ideally around £15/$30 a month. (hahaha! hahaha! ha!)
ObQuote: Léon
Currently the code behind the service is closed, but that may well change in the future. (I'd love to release it; but only if I could be sure that copy-cats wouldn't "steal" my users, and prevent me from getting more!)
The core of the service is a collection of perl modules which manage the creating, manipulation, and deletion of "domains", "users", and per-domain settings such as "is the virus scanner enabled for this domain?".
So, that's the core - a collection of objects which maintain state about a domain, and the settings the user has chosen to enable (such as blacklists, whitelists, bayasian spam filtering, virus scanning, DNS blacklists, etc).
The objects are manipulated via the web-based control panel, (and also by email), and are consulted in a read-only fashion by the mail handler itself.
The SMTP server is the qpsmtpd SMTP proxy. This is a beautifully flexible SMTP-proxy server written in perl. This server is so minimal that almost everything is written as a plugin - and thats where my helper objects come in.
I've written about 30 different qpsmtpd plugins each of which reads its setting from the objects mentioned above - and react accordingly.
So:
Currently I've only published one of my qpsmtpd plugins, but more may follow once I've decoupled them from my site. (As so many of the plugins essentially start by looking for the recipient of an email, and finding the perl-domain settings in the database many of them are very tightly coupled to my setup.)
In terms of the technology I'm using Apache 2.x, CGI::Application, & Perl for the control panel alongside qpstmpd & exim4 for the SMTP handling. There is also some of Danga's memcached thrown in to speed things up.
60% of the complexity of the service is ensuring that all the mail comes into one central quarantine area where the web application may view it. When you're archiving several Gb of email that process is .. fun. But without the online browsable, searchable, quarantine I think my service would be less fun..
I'd be happy to give more details privately if you're curious - but I hope that gives a roughly useful overview - and questions are always welcome.
Did I mention 30 days of free service for new customers? ;)
toupeira: Yes I have considered not archiving the rejected messages, but I was always keen on keeping copies.
From my point of view having a searchable, viewable, archive of all rejected messages has multiple purposes:
No system is perfect. I think my own is pretty good, but I accept that there are times when it is less good than it could be.
Because there is an archive of rejected messages the recipient has the option of looking for messages which they haven't seen because of an error without having to wait for the sender to notice the bounce, and contact the recipient out of bounds.
Many similar services have only the option of forwarding messagse which are spam to a single email address, or hiding them somewhere you can't view them.
My service is nice and open. Almost every message that is rejected may be viewed, and redelivered with a couple of mouse-clicks.
Because there is an archive of every rejected message you can immediately see how well the service is working.
Similarly a user can see that not much mail is being rejected, and could choose to save their money. I want to have satisfied customers, and when people see the quarantine area filling up they are largely impressed, pleased, and surprised.
Removing the quarantine means that any errors will only be noticed if the sender re-mails the recipient, and any figures of rejected/accepted mails aren't open to inspection and questioning.
Despite the pain I love having the ability to seeing which mail has been rejected for my domain(s). True I haven't the time or the patience to go through it all to look for falsly caught mail but, and here is the the important thing, I could if I wanted to.