Skip to content

Entries tagged "spam".

sysadmin.im considered harmful

Not for the first time I find my blog content copied and hosted elsewhere. This time via http://sysadmin.im/.

Mostly I care little if people rehost my content. But when people claim to have written it (e.g. "Posted by Admin") I get annoyed.

No explicit contact details are posted, probably to avoid complaints.

Update: Fixed URL. Stupid do.tted.na.mes.

 

Let go of the handle.

I don't talk about SPAM publicly these days, for reasons that are probably self-explanatory.

However this is just insane:

  • Saturday 20th February 2010: Registered a new domain.
  • Sunday 21st February 2010: Received first spam.

Currently at 40+ SPAM mails and rising; all mails addressed to "postmaster@", rather than any past users of the domain. (I can see from http://archive.org that the domain was last active in 2008.)

ObSubject: The Goonies

 

My hovercraft is full of eels.

Recently I've been seeing an awful lot more bounced mail addressed to my domains, to the extent that I now wonder whether they are deliberate "attacks".

Over the past four or five years I'd expect to receive one joe-job attack every six months. Over the past two that's risen to once every two months. For the past two months its been once a week.

I run several domains on my Xen guest, and most of those domains rarely have mail received, so there are only a few localparts. (A "localpart" is the bit before the @ sign in an email address.)

My main domain is steve.org.uk and unfortunately this was historically setup with "catchall" behaviour. I used that wildcard expansion pretty seriously so I had localparts such as "slashdot.org", "lwn.net", etc. Over time I've stopped making up new addresses and just stuck with "steve".

Still I'd never quite gotten round to enumerating all valid localparts, instead I tried to mitigate against these rare bounce storms with various simple hacks. For example the following procmail recipe to file away bounces:

#  Bounces
#
:0:
*(Return-Path:).*(<>)
.Automated.bounces/

However this doesn't work as well as it used to - too many idiots people are using challenge/response systems so I'll receive a reply to a mail I didn't send which doesn't look like a bounce (ie. There is a real envelope sender.)

In short blocking bounces by detecting an empty envelope sender is not a complete strategy these days. I started down the heuristic path blocking mail to "unlikely" localparts via patterns such as:

[0-9]@        DENY  Localparts never end in digits
,             DENY  Localparts never contain a comma
|             DENY  Localparts never contain PIPES.
^([^a-zA-Z])  DENY  Localparts start with a-z/A-Z
"             DENY  Quotes are never used in accounts on this system:
'             DENY  Quotes are never used in accounts on this system:

That was actually a simple change to make, via the addition of a new QPSMTPD plugin and it managed to block a lot of the bounceback spam - regardless of the envelope sender. For example:

IP:84.45.254.18    sender:<> Recipient:treacherously9@steve.org.uk
IP:203.202.253.252 sender:<> Recipient:envoyz0@steve.org.uk

Blocking "unlikely" localparts wasn't perfect, but without implementing BATV or enumerating valid localparts there wasn't too much else that I could do. In terms of numbers yesterday I blocked just over 18,500 messages with these six rules.

I also wrote a couple of cronjobs to look at the contents of the Automated.bonces folder so that I could add per-user rejections on the specific addresses being received - with some whitelisting.

(For example if I received 20+ bounces to fluffy32qp@steve.org.uk within the space of ten minutes I'd drop further mails to that address automatically.)

Anyway enough is enough. Today I woke up to just over 40,000 replies to mails I didn't send. I've now scanned my mail directories for all the email addresses I've ever used and will now only accept mail destined to those localparts.

Thankfully it turned out that since 1999 (when steve.org.uk was registered) I've only used about 150 distinct localparts, and many of those are now obsolete. So hopefully I'll now have less of a problem.

It seems to be paying off already:

62.193.234.95   wpc0505.host7x24.com  <>  virtual_rcpt_ok
    901     mail to subtotalingxa@steve.org.uk not accepted here (#5.1.1)

65.99.223.234   cobra.compukey.net    <>  virtual_rcpt_ok
     901     mail to suctionsw@steve.org.uk not accepted here (#5.1.1)

207.44.156.81   box19.fuitadnet.com   <>   virtual_rcpt_ok
     901     mail to reappearcum@steve.org.uk not accepted here (#5.1.1)

In the future this means I could still get flooded with bounces, but there will be two outcomes:

  • The bounces will not hit valid localparts and will be dropped easily, quickly, and cheaply.
  • The bounces will hit valid localparts:
    • Real bounces will end up in Automated.bounces/
    • Challenge/Response things will still reach me. Sigh.

Still this is progress and I can steal some ideas from this great spam filtering service (ahem) to improve the handling of those! (I explicitly chose to use a similar but different system for my personal mails. Even though my support system is on another box I want to avoid problems where failures requiring human intervention are swallowed in the same way that the original one was. Those kind of reasons mandate a similar system but different implementation.)

I guess I could publish some of the qpsmtpd plugins I use locally virtual_rcpt_ok, virtual_badusers, rcpt_pattern_test, etc. Then again most people who do funky things with qpsmtpd will have plenty of choice already.

ObFilm: Monty Python's Flying Circus. (OK technically not a film. Sums up my mood though.)

 

I think I did pretty well under the circumstances

Procmail

This will be the last time I talk about this, but here's my anti-rubbish procmail filter.

It correctly copes with:

  • Foreign character sets that I can't read.
  • Bounces from joe-jobs.
  • Malformed emails.

Obviously edit to suit your tastes. Especially with regard to character sets - it is a wide brush which is tarring a group unfairly. That said in practise it works for me.

If you want real antispam filtering then you should probably be looking at externalising it, or having a layered approach.

ObQuote: Citizen Kane

 

You like playing rough, huh?

According to my small business advisor it is possible to advertise your company, service, or product on the internet.

Who knows what gem of advice they'll offer next?

In unrelated news all mail delivered to me personally in HTML-only format(s) will be dropped. I've given up being patient.

Finally OpenID - what a pain it is to implement! I've fought with it over the weekend, in amongst rewiring my lighting. Setting up a Perl script to authenticate to an OpenID server is just gnarly. (I now have motion-sensitive lighting in my bathroom, which my kitten loves, and radio controlled lighting in the bedroom. Lazyness is ..)

ObQuote: Resident Evil Extinction

 

There can be only one

When volume becomes high enough you start to observe patterns in SPAM pretty easily. I think that this is primarily because people like to see patterns, whether they are present or not.

The trick is determining whether they are real patterns or not, and then to a lesser extent whether they are useful patterns.

For example I host mail for a business domain. That means that incoming messages come primarily from existing customers, and very rarely from potential new ones.

In practise that means that email is expected to arrive from 9am til 6pm (+/-2hours) Email received at 2AM? Either it is somebody working remotely, a foreign contact, or much more likely it is SPAM.

Now clearly you cannot dump all messages received at unusual times of the day, but it is a surprisingly robust SPAM indicator for that particular domain.

All heuristics are fallable, but some are useful regardless..

I'd love to know what people can learn from their SPAM. This week I'm handling approximately 80,000 messages a day, per MX, which isn't huge (ie. 2-3 million a month).

ObQuote: Highlander

 

The problem with the English language is all those pesky words

There exist many bayasian/statistical spam filters, ranging from products such as spambayes, and spamassassin, to crm114. Each of them works in their own way. Having used and tested almost all of them I've noticed a common flaw.

The vast majority of spam-filters struggle to correctly classify "419 scam" mails, lottery fraud, and similar mails.

Why is that? In general, having read hundreds of these mails, I can see several things that are common in these kind of the mails:

  • Mention of currency in both numeric and word forms. ($1,000,000 + 1 million US dollars)
  • Mention of a country / nationality (Sierra Lione, Nigerian)
  • Mention of a reference/claim number and often "official address".
  • Christian references.
  • Greetings such as "dear friend", and mentions of discretion/secrecy.
  • Size. (A scam mail is typically greater in length than an average spam mail).

Whilst none of these individually are indicative of a scam mail it is interesting to count their combined occurance.

I've written a toy program to count these things, and so far the success rate is >60% which is a reasonable start - providing this kind of detection occurs after normal filtering.

I may experiment further, but I figured a public query on scam detection might be appropriate.

Whilst the detecting a scam mail is a subset of detecting a spam email there are probably simplifications that may be made, and exploring those wouldn't be a bad thing.

ObQuote: Buffy.

 

I should be so lucky, again.

Recently the topic of spam on the Debian lists was revisited. I laugh at somebody who recieves 200 spam messages a day.

Here's my stats for yesterday:

                                          Total Mails    : 6399
                                          Total SPAM     : 6077
                                          Total Accepted : 322

                                          Spam Percentage: 94.97%

That's 6077 mails rejected at SMTP time via my filters, and only 322 mails accepted.

The breakdown of the spam rejected looks like this:

                                  Plugin      Count
--------------------------------------------------------------
                                   dnsbl       3755
                             hosts_allow        724
                             greylisting        661
                       check_earlytalker        303
                          check_spamhelo        238
             require_resolvable_fromhost        219
                           virus::clamav         79
                         check_badrcptto         75
                       check_badmailfrom         23
--------------------------------------------------------------