Skip to content

A mystery of open questions

All being well the Debian Administration website now fully supports UTF-8.

This change was a long time coming, considering the amount of time the site has been live.

Most of the changes have been present for a while:

  • Correctly setting the database to store UTF-8 internally, rather than latin1.
  • Correctly setting the charset of the generated pages.

The only missing part was ensuring the at the text input by visitors/users was correctly decoded and treated as UTF-8. This was handled by updating changing the Perl CGI module to explicitly call charset appropriately.

Since the code behind the site masks the database, memcached, and CGI handles behind singletons the change itself was pretty trivial:

I made more changes this evening to tie it all together, and to ensure that my Database connection is always forced to use UTF but I think that wasn't so important.

I hope this is vaguely useful the next time I have to fight with character sets & encodings. It is just all so nasty. Failing that these pages are vaguely useful:

ObFilm: Run Lola Run

Comments On This Entry

  1. [gravitar] rjc
    All good apart from Hebrew, Arabic and possibly other RTL languages are left alligned and the dot ending the sentence is on the wrong side.