For the past few years I've been running a simple service to block blog/comment-spam, which is (currently) implemented as a simple JSON API over HTTP, with a minimal core and all the logic in a series of plugins.
One obvious thing I wasn't doing until today was paying attention to the anchor-text used in hyperlinks, for example:
<a href="http://fdsf.example.com/">buy viagra</a>
Blocking on the anchor-text is less prone to false positives than blocking on keywords in the comment/message bodies.
(Equally some modules are essentially applications; great that the authors shared, but virtually unusable, unless you 100% match their problem domain.)
I've written about this before when I had to construct, and publish, my own cidr-matching module.
Anyway expect an upload soon, currently I "parse" HTML and BBCode. Possibly markdown to follow, since I have an interest in markdown.