Skip to content

So mega-upload is gone

So the site http://megaupload.com/ has been taken offline, amidst allegations of knowingly conducting in piracy.

There are probably a lot of legitimate users who have lost access to their uploaded files, even if they were offsite backups you can imagine a user owning a website which now has a million dead-links.

This reminds me of a conversation I overheard on Jon Dowlands blog - the summary is that he'd written a (useful) tool to extract attachments from Maildir folders and was wondering how to store and access those attachments. The upshot seemed to be magical URLs of the form:

  • https://file.example.com/sha1/509c2fe2eba509e93987c3024a74d74583c274bd

The comments covered an alternative which was hash:///sha1/xxxxxxxxxxxxxxxx, which then becomes close to the magnet:// schema.

I've not yet thought things through, but I can't help thinking that with the redundency already present in the internet we should be looking at non-server-specific links. Yes there are times right now when you might want to address a specific file on a specific server - but otherwise? Wouldn't it be nice if you could just access a file from "anywhere" which happened to have the right contents?

Already my nonporn-but-definitely-adult-site makes its images available as /img/$md5sum.jpg - and similarly the storage at the back-end of my random image upload site uses SHA1 hashes to store the actual files.

To make this more complete what we need is something that crawls the internet to find files by hash; then add support in browsers. Obviously this must be async and could introduce timing issues, but fundamentally it seems like a reasonable approach to the problem of a single host going offline.

(Consider what happens if imgur.com disappears. All those links would die, yet 99% of the images would still be available somewhere.)

I'm tempted to suggest microformat format but I need to consider the matter. Right now I'm going to immediately update my current image hosts to use, at the very least:

 <a href="/foo" rel="sha1:xxxxx md5sum:xxxx">
  <img src="foo.jpg" alt="img name">
 </a>

The unfortunate thing is you cannot have a 'rel="xx"' attribute for an image. So you either have to encode it in the parent link, or add it to the alt attribute which is suboptimal.

ObQuote: "Now, they tell me I paid my debt to society." - Oceans Eleven (2001)

Comments On This Entry

  1. [gravitar] Julien Danjou

    You should take a look at Camlistore.

  2. [author] Steve Kemp

    Thanks for the pointer; I like the RESTful upload and download and the use of JSON.

    It looks like my own sinatrastore is 99% compatible with, and identical to, the blob server they describe. Though it is obvious that is the lowest level part.

    I'll definitely be watching, nice to see Brad doing interesting things still!

  3. [gravitar] Anon

    Abusing the "a/@rel" attribute in this way doesn't seem like a good idea; after all, the hash is an attribute of the image, and not of the hyperlink. What about using "img/@class"? e.g.:

    img name