I just tried to write a blog post containing some Unicode characters (I was blogging about 日本語 learning), and found that WordPress helpfully converted those characters to question marks. After digging around, I ran into this web page describing the problem (see also this thread): basically, if you created your database in a pre-2.1.3 WP, then your database has remained in Latin-1 all these years. Oops.

I tried a few workarounds, including blowing away the database and restoring from the WP export format (including a detour through editing a php.ini file to allow uploads larger than 2MB); I still kind of think that might be the right approach, since I’m now worried about what further problems might be lurking, but the restore seemed like it was taking too long so I gave up and killed it. Ultimately, what I did was take my mysqldump backup, replace the occurrences of CHARSET=latin1 with CHARSET=utf8, and reimport it. This probably doesn’t work in general – see this post for some subtleties – but I’m hoping it worked for me. (In particular, I doubt there are too many places where I’d used non-ASCII Latin-1 characters.)

I think things are working fine now, but please let me know if you notice anything weird…

Post Revisions:

This post has not been revised since publication.