MySQL Unicode Escape Sequences

October 27, 2009

While debugging some character encoding issues today, I decided it would help to directly populate a database table with some test strings containing non-latin characters. I hoped the mysql client would, by chance, support C-, Java- or Ruby-style Unicode escape sequences. Alas, it does not.

A couple of searches later, all I’d found was my own question posted elsewhere without a satisfactory answer and a few skeptics claiming that mysql doesn’t support Unicode escape sequences in string literals.

I didn’t buy it. What do we do as a last resort? Repeat with me: RTFS. In this case, the spec isn’t too clear. Still, from the Strings and Hexadecimal Values sections, I pieced together the answer. Maybe this will save someone some time.

These examples show how to represent single and double smart quotes using UCS-2 (how most Unicode tables present character code points).
mysql> select _ucs2 x'2018';
+---+
| ‘ |
+---+
mysql> select _ucs2 x'20182019201C201D';
+------+
| ‘’“” |
+------+
mysql> select concat(_ucs2 x'2018', 'Hi.', _ucs2 x'2019');
+-------+
| ‘Hi.’ |
+-------+

You can also use the variable-length UTF-8 representation (convenient when, for example, copying from a utf-8 URL-encoded value like %E2%80%98).

mysql> select _utf8 x'E28098';
+---+
| ‘ |
+---+
Posted in sql