Character Sets and Collations - Mastering phpMyAdmin 2.11 for Effective MySQL Management

Databases Reference

In-Depth Information

Character Sets and

Collations

This chapter explains how phpMyAdmin stores and fetches data, and how it deals

with the character set and collation features available under MySQL. The program's

behavior is highly dependent on the MySQL version used.

A character set describes how symbols for a specific language or dialect are encoded.

A collation contains rules to compare and sort the characters of a character set. (See

the MySQL 4.1.x and Later section in this chapter.)

The character set used to store our data may be different from the one used to display

it, leading to data discrepancies. Thus, a need to transform the data arises.

Language Files and UTF-8

"Unicode is an industry standard designed to allow text and symbols […]

to be consistently represented and manipulated by computers". See

http://en.wikipedia.org/wiki/Unicode and also http://www.unicode.org.

Unicode currently supports more than 600 languages, which is its main advantage

over other character sets available with ISO or Windows. This is especially important

with a multi-language product like phpMyAdmin.

To represent or encode these Unicode characters, many Unicode Transformation

Formats (UTF) exist. A popular transformation format is UTF-8, which uses one to

four 8-bit octets per character. For more details, visit http://en.wikipedia.org/

wiki/UTF-8 .

Note that the browser must support UTF-8 (as most current browsers do). The

phpMyAdmin distribution kit includes a UTF-8 version of every language file in the

lang subdirectory, and some of them are only available in UTF-8 encoding.

Search WWH ::

Custom Search

Home