This weekend I mostly fiddled around migrating machines from Xen hosting to KVM hosting. Ultimately it was largely a waste of time, due to various other factors. Still with a bit of luck it will be possible to move the machiens next week.
That aside I spent a while updating my blogspam detection site. As a brief recap this site offers a simple XML-RPC service which allows you to test whether incoming blog comments are spam or not.
Originally this was put together to fight an invasion of comments submited to the Debian Administration website: The site currently shows:
| Site | Spam | Non-Spam | % spam |
| debian-administration.org |
238 |
372 |
60.98% spam |
Depressing. But not as depressing as the real live stats which show since I last reset the counters 36,995 spam comments vs. 1,206 non-spam comments. (live updating counters here)
Anyway I updated the service today to add two new plugins, both of which are a little reactionary.
The first new plugin is called "multilink" and is based upon the observation that spammers rarely know the markup of the site they are submitting comments to. This means you can frequently see submitted comments like this:
<a href="http://spam.com">buy viagra</a>
[url=http://spam.com]buy viagra[/url]
[link=http://spam.com]buy me[/link]
Here we have three different styles of links - "a href", "link=", and "url=". I figure this is a clear indicator of a confused mind, or more likely a spammer.
The second new plugin is designed to stop people who enter "<strong>" words. It is a little coarse but actuall zero false positives in the real world so I'm going to leave it live to see how it works out.
In happier news I'm just back from a trip to the beach. Sand rocks. Even if it wasn't windy enough for my kite ..
ObFilm: Dracula ("Bram Stoker's Dracula" - 1992)