Database Reference
In-Depth Information
To determine whether a given string contains multibyte characters, use the
LENGTH()
and
CHAR_LENGTH()
functions, which return the length of a string in bytes and charac‐
ters, respectively. If
LENGTH()
is greater than
CHAR_LENGTH()
for a given string, multibyte
characters are present:
• The
utf8
Unicode character set has multibyte characters, but a given
utf8
string
might contain only single-byte characters, as in the following example:
mysql>
SET @s = CONVERT('abc' USING utf8);
mysql>
SELECT LENGTH(@s), CHAR_LENGTH(@s);
+------------+-----------------+
| LENGTH(@s) | CHAR_LENGTH(@s) |
+------------+-----------------+
| 3 | 3 |
+------------+-----------------+
• For the
ucs2
Unicode character set, all characters are encoded using two bytes, even
if they are single-byte characters in another character set such as
latin1
. Thus,
every
ucs2
string contains multibyte characters:
mysql>
SET @s = CONVERT('abc' USING ucs2);
mysql>
SELECT LENGTH(@s), CHAR_LENGTH(@s);
+------------+-----------------+
| LENGTH(@s) | CHAR_LENGTH(@s) |
+------------+-----------------+
| 6 | 3 |
+------------+-----------------+
Another property of nonbinary strings is collation, which determines the sort order of
characters in the character set. Use
SHOW
COLLATION
to see all available collations; add a
LIKE
clause to see the collations for a particular character set:
mysql>
SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
+-------------------+---------+----+---------+----------+---------+
In contexts where no collation is specified explicitly, strings in a given character set use
the collation with
Yes
in the
Default
column. As shown, the default collation for
lat
in1
is
latin1_swedish_ci
. (Default collations are also displayed by
SHOW
CHARACTER
SET
.)