Steve Kemp's Blog Writings relating to Debian & Free Software

A brief twitter experiment

Sunday, 13 July 2014

So I've recently posted a few links on Twitter, and I see followers clicking them. But also I see random hits.

Tonight I posted a link to, a domain I use for "anonymous" emailing, specifically to see which bots hit the URL.

Within two minutes I had 15 visitors the first few of which were:

IP User-Agent Request;GET /robots.txt;GET /robots.txt CPython/2.7.2+ Linux/3.0.0-16-virtualHEAD / ();GET / (gzip)HEAD / (gzip)HEAD /;GET /robots.txt (compatible; TweetmemeBot/3.0; + / API/2.0 +metauri.comGET / (compatible; Yahoo! Slurp;;GET /robots.txt

So what jumps out? The twitterbot makes several requests for /robots.txt, but never actually fetches the page itself which is interesting because there is indeed a prohibition in the supplied /robots.txt file.

A surprise was that both Google and Yahoo seem to follow Twitter links in almost real-time. Though the Yahoo site parsed and honoured /robots.txt the Google spider seemed to only make HEAD requests - and never actually look for the content or the robots file.

In addition to this a bunch of hosts from the Amazon EC2 space made requests, which was perhaps not a surprise. Some automated processing, and classification, no doubt.

Anyway beer. It's been a rough weekend.



Comments On This Entry

[gravitar] Wichert Akkerman

Submitted at 20:48:16 on 13 July 2014

The HEAD requests are not from Google, but something running on AWS using a Google Java library.

[gravitar] Steve Kemp

Submitted at 20:58:19 on 13 July 2014

I should know better than to trust User-Agent strings!


Comments are closed on posts which are more than ten days old.

Spiral Logo


Recent Posts

Recent Tags


RSS Feed

  • Subscribe to feed