Steve Kemp's Blog Writings relating to Debian & Free Software

Making an old android phone useful again

Thursday, 13 August 2015

I've got an HTC Desire, running Android 2.2. It is old enough that installing applications such as thsoe from my bank, etc, fails.

The process of upgrading the stock ROM/firmware seems to be:

  • Download an unsigned zip file, from a shady website/forum.
  • Boot the phone in recovery mode.
  • Wipe the phone / reset to default state.
  • Install the update, and hope it works.
  • Assume you're not running trojaned binaries.
  • Hope the thing still works.
  • Reboot into the new O/S.

All in all .. not ideal .. in any sense.

I wish there were a more "official" way to go. For the moment I guess I'll ignore the problem for another year. My nokia phone does look pretty good ..

| 4 comments.

 

A brief look at the weed file store

Monday, 10 August 2015

Now that I've got a citizen-ID, a pair of Finnish bank accounts, and have enrolled in a Finnish language-course (due to start next month) I guess I can go back to looking at object stores, and replicated filesystems.

To recap my current favourite, despite the lack of documentation, is the Camlistore project which is written in Go.

Looking around there are lots of interesting projects being written in Go, and so is my next one the seaweedfs, which despite its name is not a filesystem at all, but a store which is accessed via HTTP.

Installation is simple, if you have a working go-lang environment:

go get github.com/chrislusf/seaweedfs/go/weed

Once that completes you'll find you have the executable bin/weed placed beneath your $GOPATH. This single binary is used for everything though it is worth noting that there are distinct roles:

  • A key concept in weed is "volumes". Volumes are areas to which files are written. Volumes may be replicated, and this replication is decided on a per-volume basis, rather than a per-upload one.
  • Clients talk to a master. The master notices when volumes spring into existance, or go away. For high-availability you can run multiple masters, and they elect the real master (via RAFT).

In our demo we'll have three hosts one, the master, two and three which are storage nodes. First of all we start the master:

root@one:~# mkdir /node.info
root@one:~# weed master -mdir /node.info -defaultReplication=001

Then on the storage nodes we start them up:

root@two:~# mkdir /data;
root@two:~# weed volume -dir=/data -max=1  -mserver=one.our.domain:9333

Then the second storage-node:

root@three:~# mkdir /data;
root@three:~# weed volume -dir=/data -max=1 -mserver=one.our.domain:9333

At this point we have a master to which we'll talk (on port :9333), and a pair of storage-nodes which will accept commands over :8080. We've configured replication such that all uploads will go to both volumes. (The -max=1 configuration ensures that each volume-store will only create one volume each. This is in the interest of simplicity.)

Uploading content works in two phases:

  • First tell the master you wish to upload something, to gain an ID in response.
  • Then using the upload-ID actually upload the object.

We'll do that like so:

laptop ~ $ curl -X POST http://one.our.domain:9333/dir/assign
{"fid":"1,06c3add5c3","url":"192.168.1.100:8080","publicUrl":"192.168.1.101:8080","count":1}

client ~ $ curl -X PUT -F file=@/etc/passwd  http://192.168.1.101:8080/1,06c3add5c3
{"name":"passwd","size":2137}

In the first command we call /dir/assign, and receive a JSON response which contains the IPs/ports of the storage-nodes, along with a "file ID", or fid. In the second command we pick one of the hosts at random (which are the IPs of our storage nodes) and make the upload using the given ID.

If the upload succeeds it will be written to both volumes, which we can see directly by running strings on the files beneath /data on the two nodes.

The next part is retrieving a file by ID, and we can do that by asking the master server where that ID lives:

client ~ $ curl http://one.our.domain:9333/dir/lookup?volumeId=1,06c3add5c3
{"volumeId":"1","locations":[
 {"url":"192.168.1.100:8080","publicUrl":"192.168.1.100:8080"},
 {"url":"192.168.1.101:8080","publicUrl":"192.168.1.101:8080"}
]}

Or, if we prefer we could just fetch via the master - it will issue a redirect to one of the volumes that contains the file:

client ~$ curl http://one.our.domain:9333/1,06c3add5c3
<a href="http://192.168.1.100:8080/1,06c3add5c3">Moved Permanently</a>

If you follow redirections then it'll download, as you'd expect:

client ~ $ curl -L http://one.our.domain:9333/1,06c3add5c3
root:x:0:0:root:/root:/bin/bash
..

That's about all you need to know to decide if this is for you - in short uploads require two requests, one to claim an identifier, and one to use it. Downloads require that your storage-volumes be publicly accessible, and will probably require a proxy of some kind to make them visible on :80, or :443.

A single "weed volume .." process, which runs as a volume-server can support multiple volumes, which are created on-demand, but I've explicitly preferred to limit them here. I'm not 100% sure yet whether it's a good idea to allow creation of multiple volumes or not. There are space implications, and you need to read about replication before you go too far down the rabbit-hole. There is the notion of "data centres", and "racks", such that you can pretend different IPs are different locations and ensure that data is replicated across them, or only within-them, but these choices will depend on your needs.

Writing a thin middleware/shim to allow uploads to be atomic seems simple enough, and there are options to allow exporting the data from the volumes as .tar files, so I have no undue worries about data-storage.

This system seems reliable, and it seems well designed, but people keep saying "I'm not using it in production because .. nobody else is" which is an unfortunate problem to have.

Anyway, I like it. The biggest omission is really authentication. All files are public if you know their IDs, but at least they're not sequential ..

| No comments

 

The differences in Finland start at home.

Thursday, 30 July 2015

So we're in Finland, and the differences start out immediately.

We're renting a flat, in building ten, on a street. You'd think "10 Streetname" was a single building, but no. It is a pair of buildings: 10A, and 10B.

Both of the buildings have 12 flats in them, with 10A having 1-12, and 10B having 13-24.

There's a keypad at the main entrance, which I assumed was to let you press a button and talk to the people inside "Hello I'm the postmaster", but no. There is no intercom system, instead you type in a magic number and the door opens.

The magic number? Sounds like you want to keep that secret, since it lets people into the common-area? No. Everybody has it. The postman, the cleaners, the DHL delivery man, and all the ex-tenants. We invited somebody over recently and gave it out in advance so that they could knock on our flat-door.

Talking of cleaners: In the UK I lived in a flat and once a fortnight somebody would come and sweep the stair-well, since we didn't ever agree to do it ourselves. Here somebody turns up every day, be it to cut the grass, polish the hand-rail, clean the glass on the front-door, or mop the floors of the common area. Sounds awesome. But they cut the grass, right outside our window, at 7:30AM. On the dot. (Or use a leaf-blower, or something equally noisy.)

All this communal-care is paid for by the building-association, of which all flat-owners own shares. Sounds like something we see in England, or even like Americas idea of a Home-Owners-Association. (In Scotland you own your own flat, you don't own shares of an entity which owns the complete building. I guess there are pros and cons to both approaches.)

Moving onwards other things are often the same, but the differences when you spot them are odd. I'm struggling to think of them right now, somebody woke me up by cutting our grass for the second time this week (!)

Anyway I'm registered now with the Finnish government, and have a citizen-number, which will be useful, I've got an appointment booked to register with the police - which is something I had to do as a foreigner within the first three months - and today I've got an appointment with a local bank so that I can have a euro-bank-account.

Happily I did find a gym to join, the owner came over one Sunday to give me a tiny-tour, and then gave me a list of other gyms to try if his wasn't good enough - which was a nice touch - I joined a couple of days later, his gym is awesome.

(I'm getting paid in UK-pounds, to a UK-bank, so right now I'm getting local money by transferring to my wifes account here, but I want to do that to my own, and open a shared account for paying for rent, electricity, internet, water, & etc).

My flat back home is still not rented, because the nice property management company lost my keys. Yeah you can't make that up can you? With a bit of luck the second set of keys I mailed them will arrive soon and the damn thing can be occupied, while I'm not relying on that income I do wish to have it.

| No comments

 

We're in Finland now.

Saturday, 25 July 2015

So we've recently spent our first week together in Helsinki, Finland.

Mostly this has been stress-free, but there are always oddities about living in new places, and moving to Europe didn't minimize them.

For the moment I'll gloss over the differences and instead document the computer problem I had. Our previous shared-desktop system had a pair of drives configured using software RAID. I pulled one of the drives to use in a smaller-cased system (smaller so it was easier to ship).

Only one drive of a pair being present make mdadm scream, via email, once per day, with reports of failure.

The output of cat /proc/mdstat looked like this:

md2 : active raid1 sdb6[0] [LVM-storage-area]
      1903576896 blocks super 1.2 2 near-copies [2/1] [_U]
md1 : active raid10 sdb5[1] [/root]
      48794112 blocks super 1.2 2 near-copies [2/1] [_U]
md0 : active raid1 sdb1[0]  [/boot]
      975296 blocks super 1.2 2 near-copies [2/1] [_U]

See the "_" there? That's the missing drive. I couldn't remove the drive as it wasn't present on-disk, so this failed:

mdadm --fail   /dev/md0 /dev/sda1
mdadm --remove /dev/md0 /dev/sda1
# repeat for md1, md2.

Similarly removing all "detached" drives failed, so the only thing to do was to mess around re-creating the arrays with a single drive:

lvchange -a n shelob-vol
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=1 --raid-devices=1 /dev/sdb6 --force
..

I did that on the LVM-storage area, and the /boot partition, but "/" is still to be updated. I'll use knoppix/similar to do it next week. That'll give me a "RAID" system which won't alert every day.

Thanks to the joys of re-creation the UUIDs of the devices changed, so /etc/mdadm/mdadm.conf needed updating. I realized that too late, when grub failed to show the menu, because it didn't find it's own UUID. Handy recipe for the future:

set prefix=(md/0)/grub/
insmod linux
linux (md/0)/vmlinuz-3.16.0-0.bpo.4-amd64 root=/dev/md1
initrd (md/0)//boot/initrd.img-3.16.0-0.bpo.4-amd64
boot

| 2 comments.

 

My new fitness challenge

Thursday, 2 July 2015

So recently I posted on twitter about a sudden gain in strength:

To put that more into context I should give a few more details. In the past I've been using an assisted pull-up machine, which offers a counterweight to make such things easier.

When I started the exercise I assumed I couldn't do it for real, so I used the machine and set it on 150lb. Over a few weeks I got as far as being able to use it with only 80lb. (Which means I was lifting my entire body-weight minus 80lb. With the assisted-pullup machine smaller numbers are best!)

One evening I was walking to the cinema with my wife and told her I thought I'd be getting close to doing one real pull-up soon, which sounds a little silly, but I guess is pretty common for random men who are 40 as I almost am. As it happens there were some climbing equipment nearby so I said "Here see how close I am", and I proceeded to do 1.5 pullups. (The second one was bad, and didn't count, as I got 90% of the way "up".)

Having had that success I knew I could do "almost two", and I set a goal for the next gym visit: 3 x 3-pullups. I did that. Then I did two more for fun on the way out (couldn't quite manage a complete set.)

So that's the story of how I went from doing 1.5 pullus to doing 11 in less than a week. These days I can easily do 3x3, but struggle with more. It'll come, slowly.

So pull-up vs. chin-up? This just relates to which way you place your hands: palm facing you (chin-up) and palm way from you (pull-up).

Some technical details here but chinups are easier, and more bicep-centric.

Anyway too much writing. My next challenge is the one-armed pushup. However long it takes, and I think it will take a while, that's what I'm working toward.

| 1 comment.

 

We're all about storing objects

Sunday, 21 June 2015

Recently I've been experimenting with camlistore, which is yet another object storage system.

Camlistore gains immediate points because it is written in Go, and is a project initiated by Brad Fitzpatrick, the creator of Perlbal, memcached, and Livejournal of course.

Camlistore is designed exactly how I'd like to see an object storage-system - each server allows you to:

  • Upload a chunk of data, getting an ID in return.
  • Download a chunk of data, by ID.
  • Iterate over all available IDs.

It should be noted more is possible, there's a pretty web UI for example, but I'm simplifying. Do your own homework :)

With those primitives you can allow a client-library to upload a file once, then in the background a bunch of dumb servers can decide amongst themselves "Hey I have data with ID:33333 - Do you?". If nobody else does they can upload a second copy.

In short this kind of system allows the replication to be decoupled from the storage. The obvious risk is obvious though: if you upload a file the chunks might live on a host that dies 20 minutes later, just before the content was replicated. That risk is minimal, but valid.

There is also the risk that sudden rashes of uploads leave the system consuming all the internal-bandwith constantly comparing chunk-IDs, trying to see if data is replaced that has been copied numerous times in the past, or trying to play "catch-up" if the new-content is larger than the replica-bandwidth. I guess it should possible to detect those conditions, but they're things to be concerned about.

Anyway the biggest downside with camlistore is documentation about rebalancing, replication, or anything other than simple single-server setups. Some people have blogged about it, and I got it working between two nodes, but I didn't feel confident it was as robust as I wanted it to be.

I have a strong belief that Camlistore will become a project of joy and wonder, but it isn't quite there yet. I certainly don't want to stop watching it :)

On to the more personal .. I'm all about the object storage these days. Right now most of my objects are packed in a collection of boxes. On the 6th of next month a shipping container will come pick them up and take them to Finland.

For pretty much 20 days in a row we've been taking things to the skip, or the local charity-shops. I expect that by the time we've relocated the amount of possesions we'll maintain will be at least a fifth of our current levels.

We're working on the general rule of thumb: "If it is possible to replace an item we will not take it". That means chess-sets, mirrors, etc, will not be carried. DVDs, for example, have been slashed brutally such that we're only transferring 40 out of a starting collection of 500+.

Only personal, one-off, unique, or "significant" items will be transported. This includes things like personal photographs, family items, and similar. Clothes? Well I need to take one jacket, but more can be bought. The only place I put my foot down was books. Yes I'm a kindle-user these days, but I spent many years tracking down some rare volumes, and though it would be possible to repeat that effort I just don't want to.

I've also decided that I'm carrying my complete toolbox. Some of the tools I took with me when I left home at 18 have stayed with me for the past 20+ years. I don't need this specific crowbar, or axe, but I'm damned if I'm going to lose them now. So they stay. Object storage - some objects are more important than they should be!

| No comments

 

I'm still moving, but ..

Saturday, 13 June 2015

Previously I'd mentioned that we were moving from Edinburgh to Newcastle, such that my wife could accept a position in a training-program, and become a more specialized (medical) doctor.

Now the inevitable update: We're still moving, but we're no longer moving to Newcastle, instead we're moving to Helsinki, Finland.

Me? I care very little about where I end up. I love Edinburgh, I always have, and I never expected to leave here, but once the decision was made that we needed to be elsewhere the actual destination does/didn't matter too much to me.

Sure Newcastle is the home of Newcastle Brown Ale, and has the kind of proper-Northern accents I both love and miss but Finland has Leipäjuusto, Saunas, and lovely people.

Given the alternative - My wife moves to Finland, and I do not - Moving to Helsinki is a no-brainer.

I'm working on the assumption that I can keep my job and work more-remotely. If that turns out not to be the case that'll be a real shame given the way the past two years have worked out.

So .. 60 days or so left in the UK. Fun.

| 4 comments.

 

A brief examination of tahoe-lafs

Thursday, 28 May 2015

Continuing the theme from the last post I made, I've recently started working my way down the list of existing object-storage implementations.

tahoe-LAFS is a well-established project which looked like a good fit for my needs:

  • Simple API.
  • Good handling of redundancy.

Getting the system up and running, on four nodes, was very simple. Setup a single/simple "introducer" which is a well-known node that all hosts can use to find each other, and then setup four deamons for storage.

When files are uploaded they are split into chunks, and these chunks are then distributed amongst the various nodes. There are some configuration settings which determine how many chunks files are split into (10 by default), how many chunks are required to rebuild the file (3 by default) and how many copies of the chunks will be created.

The biggest problem I have with tahoe is that there is no rebalancing support: Setup four nodes, and the space becomes full? You can add more nodes, new uploads go to the new nodes, while old ones stay on the old. Similarly if you change your replication-counts because you're suddenly more/less paranoid this doesn't affect existing nodes.

In my perfect world you'd distribute blocks around pretty optimistically, and I'd probably run more services:

  • An introducer - To allow adding/removing storage-nodes on the fly.
  • An indexer - to store the list of "uploads", meta-data, and the corresponding block-pointers.
  • The storage-nodes - to actually store the damn data.

The storage nodes would have the primitives "List all blocks", "Get block", "Put block", and using that you could ensure that each node had sent its data to at least N other nodes. This could be done in the background.

The indexer would be responsible for keeping track of which blocks live where, and which blocks are needed to reassemble upload N. There's probably more that it could do.

| 2 comments.

 

On de-duplicating uploaded file-content.

Thursday, 7 May 2015

This evening I've been mostly playing with removing duplicate content. I've had this idea for the past few days about object-storage, and obviously in that context if you can handle duplicate content cleanly that's a big win.

The naive implementation of object-storage involves splitting uploaded files into chunks, storing them separately, and writing database-entries such that you can reassemble the appropriate chunks when the object is retrieved.

If you store chunks on-disk, by the hash of their contents, then things are nice and simple.

The end result is that you might upload the file /etc/passwd, split that into four-byte chunks, and then hash each chunk using SHA256.

This leaves you with some database-entries, and a bunch of files on-disk:

/tmp/hashed/ef267892ee080862c96a8d2d05de62f48e20f0875f27379e7d58c73ea4455bf1
/tmp/hashed/a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921
..
/tmp/hashed/3805b0245bc8375be7125ae228eef711552ac082ffb9bf8756e2964a2393a9de

In my toy-code I wrote out the data in 4-byte chunks, which is grossly ineffeciant. But the value of using such small pieces is that there is liable to be a lot of collisions, and that means we save-space. It is a trade-off.

So the main thing I was experimenting with was the size of the chunks. If you make them too small you lose I/O due to the overhead of writing out so many small files, but you gain because collisions are common.

The rough testing I did involved using chunks of 16, 32, 128, 255, 512, 1024, 2048, and 4096 bytes. As sizes went up the overhead shrank, but also so did the collisions.

Unless you could handle the case of users uploading a lot of files like /bin/ls which are going to collide 100% of the time with prior uploads using larger chunks just didn't win as much as I thought they would.

I wrote a toy server using Sinatra & Ruby, which handles the splitting/hashing/and stored block-IDs in SQLite. It's not so novel given that it took only an hour or so to write.

The downside of my approach is also immediately apparent. All the data must live on a single machine - so that reassmbly works in the simple fashion. That's possible, even with lots of content if you use GlusterFS, or similar, but it's probably not a great approach in general. If you have large capacity storage avilable locally then this might would well enough for storing backups, etc, but .. yeah.

| 7 comments.

 

A weekend of migrations

Monday, 4 May 2015

This weekend has been all about migrations:

Host Migrations

I've migrated several more systems to the Jessie release of Debian GNU/Linux. No major surprises, and now I'm in a good state.

I have 18 hosts, and now 16 of them are running Jessie. One of them I won't touch for a while, and the other is a KVM-host which runs about 8 guests - so I won't upgraded that for a while (because I want to schedule the shutdown of the guests for the host-reboot).

Password Migrations

I've started migrating my passwords to pass, which is a simple shell wrapper around GPG. I generated a new password-managing key, and started migrating the passwords.

I dislike that account-names are stored in plaintext, but that seems known and unlikely to be fixed.

I've "solved" the problem by dividing all my accounts into "Those that I wish to disclose post-death" (i.e. "banking", "amazon", "facebook", etc, etc), and those that are "never to be shared". The former are migrating, the latter are not.

(Yeah I'm thinking about estates at the moment, near-death things have that effect!)

| No comments

 

Spiral Logo

Search

Recent Posts

Recent Tags

Links

RSS Feed

  • Subscribe to feed