Databases Reference
In-Depth Information
Character Sets and
Collations
This chapter explains how phpMyAdmin stores and fetches data, and how it deals
with the character set and collation features available under MySQL. The program's
behavior is highly dependent on the MySQL version used.
A character set describes how symbols for a specific language or dialect are encoded.
A collation contains rules to compare and sort the characters of a character set. (See
the MySQL 4.1.x and Later section in this chapter.)
The character set used to store our data may be different from the one used to display
it, leading to data discrepancies. Thus, a need to transform the data arises.
Language Files and UTF-8
"Unicode is an industry standard designed to allow text and symbols […]
to be consistently represented and manipulated by computers". See
http://en.wikipedia.org/wiki/Unicode and also http://www.unicode.org.
Unicode currently supports more than 600 languages, which is its main advantage
over other character sets available with ISO or Windows. This is especially important
with a multi-language product like phpMyAdmin.
To represent or encode these Unicode characters, many Unicode Transformation
Formats (UTF) exist. A popular transformation format is UTF-8, which uses one to
four 8-bit octets per character. For more details, visit http://en.wikipedia.org/
wiki/UTF-8 .
Note that the browser must support UTF-8 (as most current browsers do). The
phpMyAdmin distribution kit includes a UTF-8 version of every language file in the
lang subdirectory, and some of them are only available in UTF-8 encoding.
Search WWH ::




Custom Search