Database Reference
In-Depth Information
To determine whether a given string contains multibyte characters, use the LENGTH()
and CHAR_LENGTH() functions, which return the length of a string in bytes and charac‐
ters, respectively. If LENGTH() is greater than CHAR_LENGTH() for a given string, multibyte
characters are present:
• The utf8 Unicode character set has multibyte characters, but a given utf8 string
might contain only single-byte characters, as in the following example:
mysql> SET @s = CONVERT('abc' USING utf8);
mysql> SELECT LENGTH(@s), CHAR_LENGTH(@s);
+------------+-----------------+
| LENGTH(@s) | CHAR_LENGTH(@s) |
+------------+-----------------+
| 3 | 3 |
+------------+-----------------+
• For the ucs2 Unicode character set, all characters are encoded using two bytes, even
if they are single-byte characters in another character set such as latin1 . Thus,
every ucs2 string contains multibyte characters:
mysql> SET @s = CONVERT('abc' USING ucs2);
mysql> SELECT LENGTH(@s), CHAR_LENGTH(@s);
+------------+-----------------+
| LENGTH(@s) | CHAR_LENGTH(@s) |
+------------+-----------------+
| 6 | 3 |
+------------+-----------------+
Another property of nonbinary strings is collation, which determines the sort order of
characters in the character set. Use SHOW COLLATION to see all available collations; add a
LIKE clause to see the collations for a particular character set:
mysql> SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
+-------------------+---------+----+---------+----------+---------+
In contexts where no collation is specified explicitly, strings in a given character set use
the collation with Yes in the Default column. As shown, the default collation for lat
in1 is latin1_swedish_ci . (Default collations are also displayed by SHOW CHARACTER
SET .)
Search WWH ::




Custom Search