Skip to content

A brief twitter experiment

So I've recently posted a few links on Twitter, and I see followers clicking them. But also I see random hits.

Tonight I posted a link to http://transient.email/, a domain I use for "anonymous" emailing, specifically to see which bots hit the URL.

Within two minutes I had 15 visitors the first few of which were:

IP User-Agent Request
199.16.156.124Twitterbot/1.0;GET /robots.txt
199.16.156.126Twitterbot/1.0;GET /robots.txt
54.246.137.243python-requests/1.2.3 CPython/2.7.2+ Linux/3.0.0-16-virtualHEAD /
74.112.131.243Mozilla/5.0 ();GET /
50.18.102.132Google-HTTP-Java-Client/1.17.0-rc (gzip)HEAD /
50.18.102.132Google-HTTP-Java-Client/1.17.0-rc (gzip)HEAD /
199.16.156.125Twitterbot/1.0;GET /robots.txt
185.20.4.143Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)GET /
23.227.176.34MetaURI API/2.0 +metauri.comGET /
74.6.254.127Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp);GET /robots.txt

So what jumps out? The twitterbot makes several requests for /robots.txt, but never actually fetches the page itself which is interesting because there is indeed a prohibition in the supplied /robots.txt file.

A surprise was that both Google and Yahoo seem to follow Twitter links in almost real-time. Though the Yahoo site parsed and honoured /robots.txt the Google spider seemed to only make HEAD requests - and never actually look for the content or the robots file.

In addition to this a bunch of hosts from the Amazon EC2 space made requests, which was perhaps not a surprise. Some automated processing, and classification, no doubt.

Anyway beer. It's been a rough weekend.

Comments On This Entry

  1. [gravitar] Wichert Akkerman

    The HEAD requests are not from Google, but something running on AWS using a Google Java library.

  2. [author] Steve Kemp

    I should know better than to trust User-Agent strings!

 

Add A Comment

Name:
Email:
Website:
Your Comment

Your submission will be ignored if the name, email, or comment field is left blank.

Your email address will never be displayed, but your homepage will be.

Comments are closed on entries after 10 days.