Steve Kemp's Blog

Debian & Free Software

About This Site

This is a simple blog relating to Debian & Free Software issues.

Archive

Entries tagged "mail-scanning".

25th January 2008

This week has mostly involved me getting my live mail filtering site up and running with a guineapig or two.

This uses a custom user interface to allow users to manage the filtering settings for an entire domain:

  • spam filtering.
  • virus scanning.
  • greylisting.
  • sender/recipient whitelisting.
  • DNS-based blacklists

In terms of implementation this is an SMTP proxy which is built upon the qpsmtpd framework. I've got both the user interface and a collection of plugins reading all data from an MySQL database.

The practical upshot is that if you use the service you'll get less spam, and anything that has been rejected will appear in an online browsable quarantine for a period of times allowing you to view mistakes/rejected mails.

Any mail you didn't want, providing you've got the spam-filtering plugin enabled for your domain, you may send back to be trained as spam.

It scales nicely, doesn't appear to have lost any mail ever in real-world testing, and could be useful.

27th January 2008

This weekend has been an interesting mix of activities. Mostly I've been tweaking my mail filtering service now that it has more users it is more interesting to do that.

The basic process of mail-scanning is pretty simple, but there are some fun things in the mix which make it slightly more fiddly than I'd like.

The basic recipe goes something like this:

  • Accept mail.
  • Validate the mail is addressed to a domain hosted upon the machine.
  • Do the spam filtering / magic (many steps missing here)
    • If the mail should be rejected archive it to a local Maildir folder and bounce it.
    • If the mail should be accepted then forward it to the destination machine.

The archiving of all rejected messages is a big win. It means that if there is a mistake in the handling of any mail we could undo it, retraining the spam database etc. It also provides, via a web page/rss feed, a way for a user to see what a good job the filtering system is doing - by saying "Here's what you would have had ..".

Today I switched the way that the archived mail is displayed via the Web GUI. Previously I used some nasty Maildir parsing code, but now I'm running IMAP upon localhost - so the viewing of messages is a lot more straightforward. (via Net::IMAP::Simple.)

More interestingly, to most readers I'm sure, today I managed to take a new Kite out for flying. A cold and windy day, but lots of fun. There was beer, pies, and near-death!

This was also the second weekend I carried out some painting of my front room. At this rate I'll have painted all four walls of the room in less than two months! (The last time I painted a room it took approximately six months to complete. Move furnuture & paint one wall. Wait several weeks, then repeat until all walls are complete!)

27th February 2008

For the past couple of days I've been working on some "easy hosting" setup for Debian. This is a continuation of my shell-script based solution, but intended to be much more dynamic.

The system makes it simple for us to deploy a Debian Etch installation, and allow users to create virtualhosts easily. (And by easily I mean by creating directories, and very little else.)

So, for example, to create a new website simple point the IP address of your domain example.org to the IP of your machine. Then run:

mkdir -p /srv/example.com/cgi-bin
mkdir -p /srv/example.com/htdocs

If you then want to allow FTP access to upload you may run:

echo "mysecretpass" > /srv/example.com/ftp-password

This will give you FTP access, username "example.com", password "mysecretpass". You'll be chrooted into the /srv/example.com/ directory.

All of this is trivial. Via Apache's mod_vhost_alias, and a simple script to split logfiles and generate statistics via webalizer for each domain. The only thing that I really needed to do was to come up with a simple shell script & cron entry to build up an FTP password file for pure-ftpd.

So here's where it gets interesting. The next job, obviously, is to handle mail for the domains. Under Debian it should be a matter of creating an appropriate /etc/exim/exim4.conf - and ignoring the rest of the setup.

I'm getting some help with that, because despite knowing almost too much about SMTP these days I'm still a little hazy on Exim4 configuration.

I'm watching the recent debian configuration packages system with interest, because right now I'm not touching any configuration files I'm sure that it is only a matter of time.


In other news I cut prices, and am seeing a lot of interest in my mail-scanning.

Finally my .emacs file has been tweaked a lot over the previous few days. Far too much fun. (support files.)

1st March 2008

I've been re-reading RFC 2822 again over the weekend, for obvious reasons, and I'm amused I've not noticed this section in the past:

3.6.5. Informational fields

The informational fields are all optional. The "Keywords:" field contains a comma-separated list of one or more words or quoted-strings. The "Subject:" and "Comments:" fields are unstructured fields as defined in section 2.2.1, and therefore may contain text or folding white space.

subject  = "Subject:" unstructured CRLF

comments = "Comments:" unstructured CRLF

keywords = "Keywords:" phrase *("," phrase) CRLF

Now we all know that emails have subjects, but how many people have ever used the Keywords: header, or the Comments: one?

It'd be nice if we could use these fields in mails - I can immediately think of "keywords" as tags, and I'm sure I'm not alone.

I've looked at multiple "tags for mutt" systems, but all of them fall down for the same reason. I can add tags to a mail, and limit a folder to those mails that contain a given tag. But I cannot do that for multiple folders and that makes them useless :(

Has anybody worked on a multi-folder tag system for Mutt? If so pointers welcome. If not I'd be tempted to create one.

I guess implementation would be very simple. There are three caeses:

  • Adding a tag
  • Deleting a tag
  • Finding all messages with agiven tag.

The first two are easy. The second could be done by writing a cronjob to scan messages for Keyword: headers, and writing a simple index. That could then be used to populate an "~/Maildir/.tag-results" folder, via hardlinks, of all matching messages.

Better yet you could pre-populate ~/Maildir/.tag-$foo containing each message with a given tag. Then theres no searching required! (Although your cronjob would need to run often enough when the tag were added to a message it would appear there within a reasonable timeframe.

Update: I've written the indexer now. It works pretty quickly after the initial run, and is quite neat! tagging messages with mutt.

1st May 2008

Tonight I'm going to enjoy a nice long sleep after attending The Beltane Fire Festival yesterday evening.

I did manage to sort out an SSL certificate yesterday, before I went out. A lengthier process than expected because the SSL-registrar was annoying and mailed the admin address listed in whois for my domain; rather than an address upon the domain itself.

I guess they can't be blamed for that, and the registrar did forward on the request when begged, so it wasn't the end of the world. For reference I used godaddy.com; who sold me a 3 year SSL certificate for about £25.

Today I've been mostly catatonic because I had only two hours sleep last night. But one good piece of news was receiving a (postal) mail from Runa in response to the letter I had sent her some time ago.

ObQuote: 300

7th May 2008

Well a brief post about what I've been up to over the past few days.

An alioth project was created for the maintainance of the bash-completion package. I spent about 40 minutes yesterday committing fixes to some of the low-lying fruit.

I suspect I'll do a little more of that, and then back off. I only started looking at the package because there was a request-for-help bug filed against it. It works well enough for me with some small local additions

The big decision for the bash-completion project is how to go forwards from the current situation where the project is basically a large monolithic script. Ideally the openssh-client package should contain the completion for ssh, scp, etc..

Making that transition will be hard. But interesting.

In other news I submitted a couple of "make-work" patches to the QPSMTPD SMTP proxy - just tidying up a minor cosmetic issues. I'm starting to get to the point where I understand the internals pretty well now, which is a good thing!

I love working on QPSMTPD. It rocks. It is basically the core of my antispam service and a real delight to code for. I cannot overemphasise that enough - some projects are just so obviously coded properly. Hard to replicate, easy to recognise...

I've been working on my own pre-connection system which is a little more specialied; making use of the Class::Pluggable library - packaged for Debian by Sarah.

(The world -> Pre-Connection/Load-Balancing Proxy -> QPSMTPD -> Exim4. No fragility there then ;)

Finally I made a tweak to the Debian Planet configuration. If you have Javascript disabled you'll no longer see the "Show Author"/"Hide Author" links. This is great for people who use Lynx, Links, or other minimal browsers.

TODO:

I'm still waiting for the creation of the javascript project to be setup so that I can work on importing my jQuery package.

I still need to sit down and work through the Apache2 bugs I identified as being simple to fix. I've got it building from SVN now though; so progress is being made!

Finally this weekend I need to sit down and find the time to answer Steve's "Team Questionnaire". Leave it any longer and it'll never get answered. Sigh.

ObQuote: Shooting Fish

18th June 2008

Well I'm back and ready to do some fun work.

In the meantime it seems that at least one of my crash-fixes, from the prior public bugfixing, has been uploaded:

I'm still a little bit frustrated that some of the other patches I made (to different packages) were ignored, but I guess I shouldn't be too worried. They'll get fixed sooner or later regardless of whether it was "my" fix.

In other news I've been stalling a little on the Debian Administration website.

There are a couple of reasonable articles in the submissions queue - but nothing really special. I can't help thinking that the next article being a nice round number of 600 deserves something good/special/unique? hard to quantify, but definitely something I'm thinking. I guess I leave it while the weekend and if nothing presents itself I'll just go dequeue the pending pieces.

In other news I've managed to migrate the mail scanning service into a nicely split architecture - with minimal downtime.

I'm pleased that:

  • The architecture was changed massively from a single-machine orientated service to a trivially scalable one - and that this was essentially seamless.
  • My test cases really worked.
  • I've switched from being "toy" to being "small".
  • I've even pulled in a couple of new users.

Probably more changes to come once I've had a rest (but I guess I write about that elsewhere; because otherwise people get bored!).

The most obvious change to consider is to allow almost "instant-activation". I dislike having to manually approve and setup new domains, even if it does boil down to clicking a button on a webpage - so I'm thinking I should have a system in place such that you can sign up, add your domain, and be good to go without manual involvement. (Once DNS has propogated, obviously!)

Anyway enough writing. Ice-cream calls, and then I must see if more bugs have been reported against my packages...

ObQuote: Run Lola Run.

14 July 2008

Yesterday I was forced to test my backup system in anger, on a large scale, for the first time in months.

A broken package upgrade meant that my anti-spam system lost the contents of all its MySQL databases.

That was a little traumatic, to say the least. But happily I have a good scheme of backups in place, and only a single MX machine was affected.

So, whilst there was approximately an hour of downtime on the primary MX the service as a whole continued to run, and the secondary (+ trial tertiary) MX machines managed to handle the load between them.

I'm almost pleased I had to suffer this downtime, because it did convince me that my split-architecture is stable - and that the loss of the primary MX machine isn't a catastrophic failure.

The main reason for panicing was that I was late for a night in the pub. Thankfully the people I were due to meet believe in flexible approaches to start times - something I personally don't really believe in.

Anyway the mail service is running well, and I've setup "instant activation now", combined with a full month of free service which is helping attract more users.

Apart from that I've continued my plan of migrating away from Xen, and toward KVM. That is going well.

I've got a few guests up and running, and I'm impressed at how stable, fast, and simple the whole process is. :)

ObQuote: Brief Encounter

(That is a great film; and a true classic. Recommended.)

RSS feed

Tags

Created by Chronicle vUNRELEASED