MySQL 4: Specified key was too long; max key length is 1000 bytes

I've been upgrading the intranet server at work today, previously it was running Red Hat 9.0 which even the Fedora Legacy Project will be dropping support for at the end of the year. So on Friday I installed Centos 4.4 on a nice new hard drive and I spent today copying across all the files from the old drive.

Mostly everything went without issues, but one of the things we have running on our intranet server is a WackoWiki for developer documentation. I'd done a backup of the pre-upgrade database with phpMyAdmin, which had been running on MySQL 3.23.

In my previous MySQL upgrade experiences I'd encountered some issues with character encoding - basically that the old database didn't have any, and newer versions of MySQL do. Up until now I've always just updated the relevant table definitions by hand, adding in unicode to the SQL and not really thought much about the wider implications. I was delighted to discover that phpMyAdmin 2.9.1.1 has an option on the import page for 'SQL compatibility mode', from which you can select 'MYSQL323', with the result that your old export files will now import without modification. Unfortunately, that's where I ran across the problem in the post title:

Specified key was too long; max key length is 1000 bytes

Some of the indexes in the WackoWiki database schema use a combination of three 250 character varchar columns. Because a unicode character can be up to three bytes long, MySQL tries to reserve the maximum space that could be required for the index - 3 x 250 x 3 bytes - which exceed the maximum allowed. There is an open defect about the issue on the MySQL website, with many comments from various disgruntled users.

I considered changing the keys so they'd fit, but it looked like a dangerous thing to do as far as the operation of WackoWiki was concerned, and I was worried I'd end up knee deep in PHP trying to fix all the problems. One of the comments on the bug report did guide me to the solution - Scott Lane posted that he set the index columns' collation to be latin1 so that it would work as it did on MySQL 3.23. Since the Wiki had been operating quite happily for more than three years on a database with no unicode support I figured I probably didn't have anything to lose by setting the whole database to use latin1_general_ci collation. I recreated the database and tried the import again and everything imported without errors :)

Which brings me to a D'Oh! moment. After some fannying around because the privileges got a bit mixed up with me dropping and recreating databases, I finally got to view our Wiki home page, and it looked very odd. None of the links were links - they all looked something like '??MyPages ==??'. I (correctly) surmised that I'd messed up somewhere with character encoding. At least it didn't take me too long to figure out my own stupid mistake: On the import page in phpMyAdmin, just below the file selection field, there's a 'Character set of the file:' select box which I'd been ignoring. By default this is set to UTF8, except, of course, the SQL file I'd exported from my non-unicode aware MySQL 3.23 database wasn't UTF8. This doesn't matter much for the common characters, because UTF8 and ASCII are compatible for the first 127 character points, but apparently it does matter very much for some of the characters used in WackoWiki markup. I re-imported with my import character set as latin1 and everything worked fine.