Skip to content

Entries tagged "random".

You seem uncomfortable.

I've been trying to remember to post the pictures I like online for the past few months. So this is a reminder to myself.

This image below didn't turn out quite how I wanted it to:

  • I was hoping for a nicer sihouet upon the lady's face.
  • The tree-branch on the left irritates me.

But that said I keep on coming back to look at it. I like the lighting, and I love the way that the brick wall on the right hand side angles towards the building on the horizon.

Enjoy. Or not.

Sunset

A similarly "not perfect" image is this outdoor shot. I have only one irritation with this shot - and that is that the trees are clipped at the top. Meh, such is life.

(I have two styles of photography; semi-random where I snap what is in front of me, and staged where I try to construct a particular picture - the two images above? One of each.)

ObFilm: Bound

 

For the record, that's a question you never have to ask.

Five years ago I spent an hour wandering around a large department store looking to buy a kettle & a set of bathroom scales. Much to the amusement of the woman I was shopping with I spent a very long time trying to find the cheapest available set of scales. (We're talking at least 20 minutes, due to the nature of the store and the crowds.)

Once I'd selected the cheapest possible set of bathroom scales we walked over to the kitchen section of the store. I glanced over all the available kettles and picked up the one that looked the nicest (in terms of size, shape, and handle design) with no regard for the price at all.

Why? A set of bathroom scales I use maybe twice a year. A kettle I use in excess of ten times a day. Something you use that often should be right. Even if over time you take it for granted and forget about it. (FWIW the scales were £6.50 and the kettle cost me £39.95 - John Lewis 15/03/2005 - I kept the reciept!)

I'll haggle and quibble over prices for a lot of things, trying to ensure that I don't pay too much. But there are items which are worth paying for (and I don't just mean that "expensive == good" idea some people seem to have). On that basis I'll think nothing of paying £150 for a pair of shoes for example, even though I'll go out of my way to save £5-£10 on a DVD player. Because shoes are important, used very very often, and DVD players just aren't.

(ObReference: I have one pair of shoes. I have five pairs of boots. I might pretend I don't but I also have a pair of sandals. Sshhh it'll be our little secret. ;)

Anyway today my kettle broke. I had to buy a new one at short notice. I did so and the replacement is obviously more advanced. It boils quickly and quietly which is technically an advangtage but in practise is actually a drawback.

Generally speaking I'll fill the kettle, turn it on, then wander away. I'll only return to the kitchen to make my delicous beverage when I hear the "click" signaling that the kettle's job is done. This new one? From outside the kitchen I cannot hear it at all...

In conclusion: Technology and progress is all around us. Sometimes a technical step forward "being quiet" is a bad thing.

In other news I'm fighting with IPv6 & a head cold. Both suck.

ObTitle: Alias

 

I work alone like you. We always work alone.

A couple of days ago I was lamenting the state of webstats, although I was a little vague as to my purpose. Specifically I was wanting to find out about the screen resolutions and user-agents viewing a couple of sites.

To get screen resolutions you really need to inject javascript into your pages, which is icky. Still its a small price to pay, and chances are most people won't notice.

Of course there are drawbacks:

Javascript dependency.

If the visitors don't use/enable javascript you see nothing.

You cannot capture everything.

e.g. HTTP status code isn't available.

To solve this problem completely you therefore need to have access to both your apache logs and your javascript-captured information. Probably.

As a proof of concept I've injected the following javascript into most pages of three sites. This code:

  • Finds the screen resolution.
  • Finds the HTTP referer.
  • Finds the current page's title.
  • Then submits that to a server-side collection script, via a one-by-one pixel IMG

The script that receives the data writes out the data to a small per-domain SQLite database, which I can then use to generate prettyness. However I suck at being pretty, in most ways, so I've only got functional:

All of this is dynamic and most of the data is anchored to "today", as thats proof of concept enough. Were piwik not written in vile PHP I'd use that - I don't see anything similar out there which is Perl..

The big decision is now "Keep it dynamic" vs. "Output static pages". (vs. call off the experiment now I know that I'm safe to assume "big resolutions").

(Naming software is hard. Recent stuff I've done has had an skx prefix primarily for google-juice. e.g. Randomly I notice that if you search for my personal site on Google's UK engine I come top. Cool.)

ObSubject: The Bourne Identity

 

Do I look like your travel agent?

This entry is primarily composed of "random".

Palm Pre

Several people have persuaded me that I need to change my phone. I've elected to purchase a Palm Pre. Rooting them, and installing Debian is trivial, though I think its missing an openssh client - so I can read my mail in mutt via the device.

(I've seen mentions of the "scary black window" as a terminal; it isn't obvious how well that works.)

I went shopping yesterday to purchase one, but because my mobile phone contract doesn't end for another 9 days I'd have to pay an extra fee. Instead I'm going to wait 9 days and get it for free.

Getting Bigger

Having randomly remembered the idea that people shrink over the course of a day due to gravity affecting the spine I decided to test this.

For a few days in a row I measured my height before going to bed, and then again in the morning. It certainly appears to be true, average difference in height is about 9mm for me.

SEO - Is it hard?
I was involved with the setup of a new site last weekend. Today it is top-5 when searched for by two pretty broad keywords in google.

This does not seem unusual for me - though I appreciate there is a difference between "site being popular/succeeding" and "site being findable".

(I remember once attending an interview for a hotel portal site. The interview wasn't that interesting, but I remember they perked up a lot when I said "Search google for Steve Kemp - I come top".)

ObFilm: Mortal Kombat

 

Looks like me an Vincent caught you boys at breakfast

It is interesting that François Marier recently posted a brief "howto" document on debugging problems caused by overly-agressive filtering with privoxy, as I've recently been having problems with that tool.

My home network frequently changes configuration depending on what I'm concentrating upon, but every few months I'll start/cease using the following tools:

  • squid - The caching proxy server.
  • tor - The onion router.
  • privoxy - The filtering cache.

Recently I was experimenting with XSS attacks against various browsers, which meant using them for real. As not all browsers have the same anti-advert setups I was running privoxy to filter out web-annoyances, and I spotted a major flaw with it.

Unfortunately I can only describe the problem, not reproduce it, or track it down. I'm 80% certain the bug is in privoxy, but the stack is suitably high that determining that for sure is problematic.

In short the issue is that HTTP requests would end up being sent to the wrong host:

  • I load my start page in one tab: http://www.steve.org.uk/start/
  • I click to open the following URL in another tab: http://www.perlmonks.org/?node=Newest Nodes.
  • The request gets sent to http://steve.org.uk/?node=...

After that clicking around consistently sends requests to the first HTTP host which was accessed succesfully. So, for example, attempting to visit http://foo.com/bar/ will send the request to http://steve.org.uk/bar - which then gives a 404.

In terms of setup I use a dnsmasq DNS cache, privoxy and iceweasel from Debian unstable. From the symptoms I'm not sure if iceweasel's "KeepAlive" system is to blame, or if privoxy has a bad cache of hosts. Perhaps it is dnsmasq returning bogus DNS data, or my cable connection itself having DNS issues.

Anyway once the symptoms present themselves closing the browser and restarting the cache fixes it. Until the next time which might be hours or days later.

I'd report it as a bug - but I don't know where it should be. Privoxy caching things it shouldn't? iceweasel having keepalive issues? dnsmasq returning wrong DNS entries?

I'd ask "Have you seen this before, internet world?" but I guess if you have tracked it down it'd be fixed by now, and it clearly isn't!

Anyway for the moment I've uninstalled privoxy.

ObFilm: Pulp Fiction

 

Oh, this should be stunning.

Recently I've been writing some documentation using the docbook toolset.

"Helpfully" the docbook tools produce a nice table of contents for your documentation. For example it will produce an index.html file containing a list of chapters, list of figures, list of tables, and finally a list of examples.

For my specific use I only wanted a table of contents listing chapters, all the other lists were just noise.

Unfortunately I've produced my documentation using the naive docbook2html tool, and all the details I can find online about customising the table of contents to remove specific items refer to using xslt and other more low-level tools.

So I thought I'd cheat. Looking at the generated index.html file I notice that the contents I wish to remove have got class attributes of TOC.

Is there a tool to parse HTML removing items with particular ID attributes? Or removing items having a particular CLASS?

I couldn't find one. So I knocked one up, using HTML::TreeBuilder::XPath, perhaps it will be useful to ohters:

html-tool --file=index.html --cut-class=foo --indent

The file index.html will be read, parsed, and all items with "class='foo'" will be removed. The output will be indented in a pretty fashion and written to STDOUT.

This example does a similar thing:

html-tool --url=http://www.steve.org.uk/ --output=x.html \
  --cut-id=top --cut-class=mbox --indent

I dabbled with allowing you to just dump HTML sections, so you could run:

html-tool --show-class=foo --file=index.html

But that didn't seem as obviously useful, so I dyked it out. Other similar operations could make it more generally useful though - right now it's more of a html-cut than a html-tool!

ObFilm: The Breakfast Club

 

You tortured me? You tortured me!

DNS is hard, let's go shopping.

<irony>

CNAME & MX records do not mix.

</irony>

ObFilm: V for Vendetta.

 

I'm getting married, I'm not joining a convent!

(This post was accidentally made live before it was completed; it is now complet.)

I'll keep this brief and to the point.

syslog indexing and searching

Jason Hedden suggested using swish-e to index and then search syslog files which are stored on disk - rather than inserting the log entries into mysql.

I have 120+ machines writing to a central server, and running a search of 'sshd.*refused' takes less than a second to complete now.

(To be fair using php-syslog-ng was fast, it was just ugly, hard to manage, and the mysql database got overloaded)

Cloud Storage .. but on my machines

I've become increasingly interested in both centralised hosting, and reliable backups.

Cloud storage, where I contrl all the nodes, allows good backups.

So far I've experimented with both mogilefs and peerfuse. Neither setup is entirely appropriate for me, but I love the idea of seamless replication.

ice-creams

Many ice-creams bought in supermarkets come in packs of three. Annoying:

  • One for madam x.
  • one for me.
  • Who gets the spare? (Me, when she's gone ;)

It happens too often to be a coincidence: my cynicism wonders if it is designed to ensure people buy two boxes..?

new software releases

skxlist, the simple mailing list manager, got a couple of new options after user-submitted suggestions.

asql got a bugfix.

My todolist code is now running on at least one other site!

Nothing else much to say mostly because I'm suffering from poor sleep at the moment. In part because I've got a new clock on my bedroom windowsill and the ticking is distracting me (not to mention the on-the-hour chime!)

Still I'm sure it will pass. I grew up in York in a house that had the back yard abutting the local convent. Every hour, on the hour, they'd have bell ring! We moved house when I was about 11, but for months after the move I'd still wake up at midnight confused that the bells hadn't rung...

ObFilm: Mamma Mia!

 

Not even if you let me video tape it.

The online todo list seems popular, or rather a lot of people logged in with the posted details and created/resolved tasks at least.

It is kinda cute to watch multiple people all using a site with only one set of login credentials - I guess this is a special case because you cannot easily delete things. Still I wonder how other sites would work like that? Would visitors self-organise, some trashing things, and others repairing damage? (I guess we could consider wikipedia a good example.)

Anyway I've spent a little while this morning and this lunchtime adding in the missing features and doing things suggested by users.

So now:

  • "Duration" is shown for both open & completed tasks.
  • The "home" page is removed as it added no value.
  • Tasks may be flagged as urgent.
  • *Tasks which have titles beginning with "*" are urgent by default).
  • Searching works across tags, notes, and titles.
  • Tag name completion is pending.

I think in terms of features that I'm done. I just need to wire up creation of accounts, and the submission of tasks via email. Then un-fuck my actual code.

I guess as a final thing I need to consider email notices. I deliberately do not support or mandate "due dates" for tasks. I think I prefer the idea of an email alert beign sent if a task is marked as urgent and has had no activity in the past 24 hours. (Where activity means "new note". e.g. you'd add "still working on this", or similar to cancel the pending alert)

Sending alerts via twitter could also be an option, although I still mostly abhor it.

I've had a brief look at both tadalist.com and rememberthemilk.com both seem nice .. but I'm still not sure on a winner yet.

ObFilm: Chasing Amy

 

I bet you're a real tiger in disguise.

I've not been online much for the past week, for two main reasons:

My cat was injured

Usually my cat lives outdoors for just over half the day, but last week he came back with a bad limp and since then he's been an indoor kitty.

He stopped limping so badly pretty much the next day, but has managed to scratch/bite a fair amount of fur from his leg worrying at it.

Still that seems to have stopped and I'm sure he'll be fine. Once the fur has grown back I'll throw him back outside and hope he's more careful in the future!

(No idea whether it was a bad fall, a collision, or a fight with a cat/dog/fox/squirrel that caused it .. no obvious injuries when he was at the vets such as bites, scratches, or things embedded in the limb.)

Geomtery Wars Galaxies

I might be a bit slow, but this nintendo DS game rocks. Hard.

End Communiqué

ObFilm: Faster, Pussycat! Kill! Kill!

 

The doctors say you're going to live, that's the bad news.

It is annoying that some protocols and systems are more complex than you might expect them to be.

Jabber is a protocol that is notionally simple: XML Messages pass back and forth between server(s) and client(s). But if you look at the contents of XML which is passed around you'll soon discover that even logging in is a complex operation and that Jabber is not implemented in a pleasant fashion.

By contrast many other protocols are lovely. I'm sure I'm not alone in using and debugging many common protocols with nothing more than telnet. SMTP, HTTP, POP3, etc, are all pretty easy to drive interactively.

I think 90% of programmers at some point in their lives implement a HTTP server. But I draw the line at that kind of thing these days, client-side applications are useful and simple enough with the right libraries. (e.g. my sift client-side IMAP scripter has replaced procmail on a couple of machines. Watching to see if I get a reply from somebody specific and sending me an SMS on a match..)

But recently I've been flirting with the development of an IMAP server.

Dovecot appears to be the canonincal IMAP/POP3 server these days and it is pretty close to meeting my needs, but it isn't close enough unless I jump through and change the way my mailboxes are organised. (ie. The maildir mailboxes are arranged in such a fashion that dovecot cannot easily handle them, unless I mess about with symlink farms and make them all read-only.)

I guess in conclusion it would be nice if there were a basic IMAP server framework which you could just subclass "login" and "mailbox" sections and then instantiate.

I wrote a quick inetd-driven hack which supports only the bare essentials ("NOOP", "CAPABILITY", "LOGIN", "FETCH", "SELECT" and "LIST") That allows me to connect via IMAP in both mutt and thunderbird, view folders and download messages.

Still I'm strongly suspecting that there are better uses of my time, even if I could use it in several ways..

ObFilm: La Femme Nikita

 

You'd better get yourself a garlic T-shirt, buddy, or it's your funeral

There are times when I hate xkcd. Mostly these are:

1. When reading a discussion on /. and you just know a particular image will be posted.

2. When you spend hours searching for a specific comic that you're certain exists.

The latter is what bit me tonight - I'm certain there exists a cartoon which has a plot of:

Woman says hi.

Guy says hi.

Woman looks confused.

Guy realises she was talking to her phone, not him.

Cannot find the image for the life of me - only phone-related image I could find was tones.

I thought I might get lucky if I knocked up a quick hack to search the alt-text on all the images, but sadly not.

Still it was a fun project. To be uber-useful we'd need to persuade people to imput the text in each cartoon, along with the number.

Given that there are only 550ish cartoons published thus far creating a database would take a person a day, or a group of people a couple of hours.

Tempting .. very tempting ..

ObFilm: The Lost Boys. Yay!

 

It is an army bred for a single purpose

It is funny the way things work out when you're looking for help.

Recently I was working on a Ruby + FUSE based filesystem and as part of the development I added simple diagnostic output via trivial code such as this:

@debug && puts "called foo(#{param});"

That was adequate for minimal interactive use, but not so good for real live use. In real live use I started outputing messages to a dedicated logfile, but in practise became overwhelmed by thousands of lines of output describing everything ever applied to the filesystem.

I figured the natural solution was to have a ring-buffer. (Everybody knows what a ringbuffer is, right?) It could keep the last 500 messages and newer debug information would just replace older entreis. That'd be just enough to be useful if I had a problem, but not so overwhelming it would get ignored.

In Perl I found a nice ringbuffer library, but for Ruby nothing. Locking a region of shared memory via shmget, shmset and keeping an array of a few hunded strings would be simple, but it seems odd I have to code this myself.

I started searching around and I accidentally stumbled upon the unrelated IPC::DirQueue perl module. Not useful for my ringbuffer logging problem, but beautifully useful.

There is no package for Debian but that was easily created:

dh-make-perl --build --cpan IPC::DirQueue

Already I have a million and one uses for it - not least to solve my problem of maintaining a centralised quarantine for all the spam mail rejected by N MX machines. (Which currently uses a combination of rsync and lockfiles.)

This is the reason why sites like Perl Advent Calendar are useful - they introduce a useful module every day or two, and introduce you to thinks that you can use in the future.

Of course keeping a sustainable site like that up and running is hard which is why sites like debaday struggle to attract contributors, for example.

Anyway random happyness.

ObFilm: Lord of the rings: Two Towers

 

I saw green fields and flowers. I could smell the grass.

Fabio Tranchitella recently posted about his new filesystem which really reminded me of an outstanding problem I have.

I do some email filtering, and that is setup in a nice distributed fashion. I have a web/db machine, and then I have a number of MX machines which process incoming mail rejecting spam and queuing good mail for delivery.

I try not to talk about it very often, because that just smells of marketting. More users would be good, but I find explicit promotion & advertising distasteful. (It helps to genuinly consider users as users, and not customers even though money changes hands.)

Anyway I handle mail for just over 150 domains (some domains will receive 40,000 emails a day others will receive 10 emails a week) and each of these domains has different settings, such as "is virus scanning enabled?" and "which are the valid localparts at this domain?", then there are whitelists, blacklists, all that good stuff.

The user is encouraged to fiddle with their settings via the web/db/master machine - but ultimately any settings actually applied and used upon the MX boxes. This was initially achieved by having MySQL database slaves, but eventually I settled upon a simpler and more robust scheme: Using the filesystem. (Many reasons why, but perhaps the simplest justification is that this way things continue to work even if the master machine goes offline, or there are network routing issues. Each MX machine is essentially standalone and doesn't need to be always talking to the master host. This is good.)

On the master each domain has settings beneath /srv. Changes are applied to the files there, and to make the settings live on the slave MX boxes I can merely rsync the contents over.

Here is an anonymized example of a settings hierarchy..

So a user makes a change on the web machine. That updates /srv on the master machine immediately - and then every fifteen minutes, or so, the settigngs are pushed accross to the MX boxes where the incoming mail is actually processed.

Now ideally I want the updates to be applied immediately. That means I should look at using sshfs or similar. But also as a matter of policy I want to keep things reliable. If the main box dies I don't want the machines to suddenly cease working. So that rules out remotely mounting via sshfs, nfs or similar.

Thus far I've not really looked at the possabilities, but I'm leaning towards having each MX machine look for settings in two places:

  • Look for "live" copies in /srv/
  • If that isn't available then fall back to reading settings from /backup/

That way I can rsync to /backup on a fixed schedule, but expect that in everyday operation I'll get current/live settings from /srv via NFS, sshfs, or something similar.

My job for the weekend is to look around and see what filesystems are available and look at testing them.

Obmovie:Alive

 

There's something out there waiting for us, and it ain't no man.

Things have turned a little morbid here.

I imagine that if I were to cease to be alive things would mostly keep ticking over for a while. But for how long exactly?

Assuming that you've got your hosting paid for, supported, or otherwise managed that would continue to exist. But after a while domain names would start to expire, and manual intervention would be required (that is assuming that manual intervention were not required in advance.)

So when I die, I'd have to assume everything I maintained myself would disappear within two years.

Is that depressing, or realistic? I'm not sure. But definitely morbid.

ObFilm: Predator

 

They look like big, good, strong hands, don't they?

Russ Allbery recently commented that it is really nice to receive patches for trivial scripts posted online.

I agree.

More than once I've posted a trivial script and had it be improved by people, or later included elsewhere.

So in the spirit of sharing here is my latest toy script:

This is a trivial script which searches a Maildir hierarchy and outputs a list of each email address which you've ever sent mail to.

Why would you want that? In my case my (personal) spam filtering makes use of whitelisting, and the assumption is that if I've ever mailed you in the past then I want to see your replies, and you get a break.

These days my (personal) mail filtering has a couple of broad rules:

  • If your mail is HTML it is junk. Unless I'm bored.
  • If your mail is GPG signed/encrypted I will see it.
  • If your mail address is on my whitelist then I want to see it.

After that then I see your message only if CRM119 decides I should.

#
# remove potentially spoofed header
#
:0 fhw
* ^X-whitelist:
| $FORMAIL -I "X-whitelist"

#
#  GPG-signed messages are OK and will be whitelisted
#
:0fW
* < 1024000
|/home/steve/bin/isgpged

:0e
| $FORMAIL -A "X-whitelist: yes" -A "X-GPG-Signed: Yes"

#
#  Get the sender of the message.
#
FROM=`formail -x From:| sed 's/^\([^@]*[ <]\)//' | sed 's/\([ >]\).*$//'`

#
# Add a whitelist tag if appropriate
#
:0 fhw
* !^X-whitelist: yes
* ? test -s $HOME/.procmail_whitelist
* ? echo $FROM| fgrep -qisf $HOME/.procmail_whitelist
| $FORMAIL -A "X-whitelist: yes" -A "X-Whitelist-Test: $FROM"

The net result of these tests is that I can now run the spam filter on non-whitelisted mails:

#
# Run CRM114 mailreaver
#
:0fw: .msgid.lock
* !^X-whitelist: yes
| /usr/bin/crm -u /home/steve/.crm /usr/share/crm114/mailreaver.crm

#
#  Spam.
#
:0:
* ^X-CRM114-Status: SPAM.*
* !^X-whitelist: yes
.CRM.Spam/

#
#  Unsure.
#
:0
* ^X-CRM114-Status:.*UNSURE
* !^X-whitelist: yes
.CRM.Unsure/

There is more to my setup than that, but that's the minimum you'd need to see.

Of course this is a reminder, once more, that the kind of filtering that you carry out for yourself is different from that that other people will do.

ObFilm: The NeverEnding Story

 

They look uncannily like something you should be very, very afraid of

"I've got chills
They're multiplying
"

I guess technically I could have used that as a subject, but ugh.

If ever you're a bit shiverry, and a bit unwell, don't shave your head. It'll take three times as long, and you'll cut yourself.

ObQuote: Red Dwarf.

 

Have you been following that man?

meta-hacking

I've had a lot of fun over the past few years detecting and fixing XSS attacks - a few months ago compromising several thousand user-accounts belonging to a particular niche social networking site and then more recently experimenting with XSS issues upon a popular software developer's advocate blog.

One thing I've been wondering about recently is meta-XSS attacks.

Consider the LKML (linux kernel mailing list). This list receives lots of long patches, submitted by email, which are copied verbatum to various sites. For example if I mailed an interesting patch to LKML chances are it would get posted to:

(Obviously the challenge here is to make a patch sufficiently interesting that it received more than usual coverage.)

Do each of those sites HTML-encode patches? In general they do, certainly the ones I looked at had code like this:

#include &lt;linux.h&gt;
...
...

But I'm certain that not all sites do so. I'm also pretty sure there are interesting avenues to explore here, and the general idea of indirectly attacking a specific target is ripe for exploration.

Anyway I'm probably not the person to go playing in the field these days; I don't have the time. But it is certainly interesting to think about.

ObFilm: Dirty Harry

 

It's in your nature to destroy yourselves.

Elections

I've said this elsewhere, but it bears repeating:

Anybody who expects a nation to turnaround overnight, due to a changing government, hasn't watched/read enough documentaries.

Television

Who is going to make documentaries when David Attenborough dies?

ObFilm: Terminator 2

 

Then don't knock it, it's got it's own key.

ObRandom:

Any blog post, comment thread, question, or email which starts "Hi guys" is bad, wrong, and probably not worth reading.

ObTitle: Dawn of the dead - the original and best version - 1978

 

So cunning you could brush your teeth with it.

Lets take a look at a new tool available to Lenny & Sid:

apt-get source acon
int main(int argc,char **argv)
{
        int i,tty,useunicode=0;
        char *fontf=0,*translationf=0,*keymapf=0;

        get_ids();
        set_user_id();

        /*Read configure file if no input options*/
        if(argc<2)
        {
                char *env;
                FILE *fp;
                char font[300],translation[300],keymap[300];
                char tmp[300];

                font[0]=translation[0]=keymap[0]=0;
                if((env=getenv("HOME")))
                        sprintf(tmp,"%s/.acon.conf",env);
                else
                        strcpy(tmp,"/etc/acon.conf");

Hmmm. Nice use of the environment there. I wonder what permissions the binary has:

skx@gold:~$ ls -l /usr/bin/acon
-rwsr-xr-x 1 root root 48672 2008-06-09 10:50 /usr/bin/acon

setuid(0) - just say no.

ObTitle: Blackadder II

 

You like playing rough, huh?

According to my small business advisor it is possible to advertise your company, service, or product on the internet.

Who knows what gem of advice they'll offer next?

In unrelated news all mail delivered to me personally in HTML-only format(s) will be dropped. I've given up being patient.

Finally OpenID - what a pain it is to implement! I've fought with it over the weekend, in amongst rewiring my lighting. Setting up a Perl script to authenticate to an OpenID server is just gnarly. (I now have motion-sensitive lighting in my bathroom, which my kitten loves, and radio controlled lighting in the bedroom. Lazyness is ..)

ObQuote: Resident Evil Extinction

 

Hey, Ash, where are we?

I'm currently fighting with MySQL. The following takes too long:

mysql> SELECT COUNT(id) FROM q_archive;
+-----------+
| COUNT(id) |
+-----------+
|   2738048 |
+-----------+
1 row in set (17.95 sec)

I would like it to take significantly less time, even with memcached being in use it gets hit too often. I've added an index to the table - but I didn't expect that to help, and I wasn't disappointed.

Ho hum.

Maybe another case where flat-files are best. Sure counting them would take a while, but once I've counted them I can just `cat sum`.

This is probably a case where tweaking memory of MySQL would help. But I'm fairly certain if I start messing with that I'll get into trouble with other parts of my site.

ObFilm: The Evil Dead

 

Five grand a head

It is nice when you work for a company where you can say:

"Ice-lolly break..."

The response?

"Me too!"

Tonight has been a productive evening, I guess the ice-lolly helped!

I managed to optimize the storage of rejected SPAM mail for my commercial service. That is something I've been obsessing over recently since the volume of SPAM is currently hovering around 2.5 million messages.

Still I suspect it is only a matter of weeks before I need to expand. The current setup has me using three machines:

  • Primary machine runs:
    • Web Application
    • SMTP processing/filtering/delivery
  • Secondary machine runs:
    • SMTP processing/filtering/delivery
  • Offsite machine:

Ideally I'd like to split that up further so that I have a single machine running the web application (the part the user interacts with), a pair of MX machines, and the offsite machine doing the minimal work it does.

That way the incoming mail will not affect the application at all directly.

Thankfully the split should be trivial. The only hard part is finding a fast webhost that can offer me ~1Gb of RAM, ~1000Gb of disk space, and won't charge much. Ideally around £15/$30 a month. (hahaha! hahaha! ha!)

ObQuote: Léon

 

I spent my life trying not to be careless

There should be a word for those silly little ways you can fool your body & brain. For example recently I've been having trouble with my boiler - so getting hot water is a challenge.

I find myself doing the crazy thing:

  • Turn on hot tap(s)
  • Stick my hands under them to see if the water is hot.
  • Think to myself "Hey it is getting warmer..".
  • Realise actually I just imagined it.

Lather, rinse, repeat.

Similarly there are times when you can imaging all kinds of bodily sensations. More than once I've been walking out, or sat at home, convinced that my mobile phone vibrated in my pocket. And it hadn't at all.

I remember, random, conversations with people who agreed they sometimes believe their phones are vibrating when they are not. Seems to be a common thing.

Which begs the question, is this a modern thing? Ten years ago if you had something vibrating against your body you damn well knew about it ... because you were doing it deliberately!

It is only recently that it was possible to have something semi-randomly vibrating against you, without your explicit control. Right?

(OK that sounds rude. It'll be our little secret.)

ObQuote: Godfather (Pt.1)

 

Alcohol's illegal this month

Busy times, despite being on holiday.

Mostly this has been doing "business" work, and fiddling with self-promotion. But despite this I managed to find time to write some extremely useful new Lisp:

Anyway very little time over the coming week will be spent online. All being well. Still enjoying playing with my (loaned) Nokia 770 - maybe I'll get another one of my own eventually.

ObQuote: 30 Days Of Night

 

I'm the only one qualified to remote-pilot the ship anyway.

http://10.print.debian.rocks.twentygototen.org/

ObQuote: Aliens

 

This is the Voice of Doom calling

My biscuits keep breaking up and falling into my coffee.

Help!

ObQuote: The Philadelphia Story

 

I wish I could tie you up in chains

Today I've been mostly unwell. Although I have managed to write some minor new code, and watch a little bit of Doctor Who on DVD.

Recently several people have been ranting about Ruby on Rails. I like it, but I wouldn't use it for personal development in a hurry. Deployment is fiddly, and upgrades are annoying.

But one thing that I utterly condemn Rails for is helping to spread bad paging throughout the online world.

So, what is "bad paging" and why is it important? Well cool URLs don't change, right? "Bad paging" is any user-interface which presents you with a limited view upon a changing list of items which is non-bookmarkable.

Consider the following "list". Assume it represents your view of a collection of items numbering 100+. You may only view ten items at a time; clicking "next", or "previous", to navigate your viewport:

1.  first item
2.  second item
..
10. tenth item

[see next: /start/1] [see prev]

Whats wrong with this picture? It is subtle, but this list is broken. The issue is that when the list grows new items are prepended to the front, yet the navigation is linked to the starting page number.

If that description wasn't clear consider what happens if you want to bookmark the page containing item 11. How can you?

Right now it is at /start/1. If a ten new items are appended to the head of the list then it will instead become /start/2 - as items that are currently numbered 1-10 will be shifted forward to become items 11-20, and and they will be on page /start/1 instead.

The solution is simple enough once you consider what you want to happen:

  • Either append items to the end of the list.
    • Such that /start/1 always gives the items 11-20.
  • Number the links in the reverse order.

So why does nobody do that? (As a counter example look at my website: Rather than the 'Show previous' items linking you to the changing link /start/1, it instead links you to /start/569 (for example).

 

Never Say Goodbye

If you try using some of my software, or any software come to think of it, and it doesn't work, or causes you problems then there is a simple solution.

Tell me. I might not be able to fix it immediately, I might not ever be able to fix it. But chances are I can, and if there is a record it'll help others out in the future regardless.

I've bumped into this in the past "Oh yes I tried to use that tool you wrote but it didn't work, so I ended up with something else."

 

There's a hell of a lot more to me

This weekend has been an interesting mix of activities. Mostly I've been tweaking my mail filtering service now that it has more users it is more interesting to do that.

The basic process of mail-scanning is pretty simple, but there are some fun things in the mix which make it slightly more fiddly than I'd like.

The basic recipe goes something like this:

  • Accept mail.
  • Validate the mail is addressed to a domain hosted upon the machine.
  • Do the spam filtering / magic (many steps missing here)
    • If the mail should be rejected archive it to a local Maildir folder and bounce it.
    • If the mail should be accepted then forward it to the destination machine.

The archiving of all rejected messages is a big win. It means that if there is a mistake in the handling of any mail we could undo it, retraining the spam database etc. It also provides, via a web page/rss feed, a way for a user to see what a good job the filtering system is doing - by saying "Here's what you would have had ..".

Today I switched the way that the archived mail is displayed via the Web GUI. Previously I used some nasty Maildir parsing code, but now I'm running IMAP upon localhost - so the viewing of messages is a lot more straightforward. (via Net::IMAP::Simple.)

More interestingly, to most readers I'm sure, today I managed to take a new Kite out for flying. A cold and windy day, but lots of fun. There was beer, pies, and near-death!

This was also the second weekend I carried out some painting of my front room. At this rate I'll have painted all four walls of the room in less than two months! (The last time I painted a room it took approximately six months to complete. Move furnuture & paint one wall. Wait several weeks, then repeat until all walls are complete!)

 

You can't hide the knives

After recently intending to drop the Planet Debian search and recieving complaints that it was/is still useful it looks like there is a good solution.

The code will be made live and official upon the planet debian in the near future.

The DSA team promptly installed the SQLite3 package for me, and I've ported the code to work with it. Once Apache us updated to allow me to execute CGI scripts it'll be moved over, and I'll export the current data to the new database.

In other news I'm going to file an ITP bug against asql as I find myself using it more and more...

 

don't go breaking my heart

If you're interested in working upon your CV/Resume, as Otavio Salvador was recently, then I'd highly recommend the xml-resume-library.

It allows you to write your address, previous jobs, and skills as XML then generate PDF, HTML, and plain text format documents via a simple Makefile.

It won't help with clueless agencies that mandate the use of Microsoft Word Documents for submission, so they can butcher your submission and "earn" their fee(s), but otherwise it rocks.

 

Let the bells ring out for Christmas

In the next week I intend to drop the search engine which archives content posted to Planet Debian.

It appears to have very little use, except for myself, and I'm significantly better at bookmarking posts of interest these days.

If you'd like to run your own copy the code is available and pretty trivial to reimplement regardless. There are only two parts:

  • Poll and archive content from the planet RSS feed - taking care of duplicates.
  • Scanning for /robots.txt upon the source-host, to avoid archiving content which should be "private".

Once you've done that you'll have a database populated with blog entries, and you just need to write a little search script.

ObRandom: In the time it has been running it has archived 15,464 posts!

 

We are the champions my friend

My tool to query apache logfiles via SQL seems suprisingly popular.

Just as a recap the process goes like this:

  • Start the shell.
  • A temporary SQLite database is created.
  • You load any number of apache logfiles into it.
  • Then queries may be executed against those records until you exit.
  • The temporary database is dropped.

Now it is possible to save and load the SQLite database, so that you don't need to reparse the apache logs each time, that gives a nice speed increase for non-changing files.

By tonight I'll have aliases working for queries so you can bookmark them:

alias refers SELECT distinct(referer) FROM logs

Then in the future the 'refers' command will be available and will run the named query. Neat.

Now that I'm comfortable with SQL queries it just seems so natural, easy, and right to query logfiles this way. I guess that makes me strange.

 

You're not going to end up like your mum and dad

I've been working on updating my online film list since Thursday evening.

I have some code which will convert static data-files containing film entries into a browsable HTML site.

The next job is to actually go through all our DVDs and make sure the lists are correct.

I've updated all our TV shows, and I've made an initial pass at making sure all our films are present but it'll take me a few more days to ensure the lists are completely correct.

In the past I used to browse my list of films via my mobile phone to make sure I didn't buy duplicate films (more than once in the past I had managed to do that!) These days I don't seem to need to, but it is nice for organizing and it appeals to my love of lists..

I'm not sure which is worse, me doing it or Megan taking one look and saying "That's so cool!".

 

Since you've been gone

Confessor - Terry Goodkind's last novel in the Sword of Truth series.

Brilliant.

Exceptionally Brilliant.

Well worth waiting for, and the annoyance of 'Chainfire' itself which seemed to go nowhere despite its length.

 

walking on the moon

According to popcon I have just under 1000 users of xen-tools.

That was quite a suprise to discover via a random google search, although I guess there have been a lot of bugs filed against the package during its lifetime.

Funny how some things which start as random hacks (this was originally a quick and dirty hack for a Xen introduction article) become quite useful/popular, whereas other tools which were planned and designed go virtually unnoticed...