Database Reference
In-Depth Information
Character and Binary String Types
The character datatypes in Oracle are CHAR , VARCHAR2 , and their “ N ” variants. The CHAR and NCHAR can store up to 2,000
bytes of text. The VARCHAR2 and NVARCHAR2 can store up to 4,000 bytes of information.
starting with Oracle 12 c , VARCHAR2 , NVARCHAR2 , and RAW datatypes can be configured to store up to
32,767 bytes of information. extended datatypes are not enabled by default; therefore unless explicitly configured the
maximum size is still 4,000 bytes for VARCHAR2 and NVARCHAR2 datatypes and 2,000 bytes for RAW . see the “extended
Datatypes” section later in this chapter for more details.
Note
This text is converted between various character sets as needed by the database. A character set is a binary
representation of individual characters in bits and bytes. Many different character sets are available, and each is
capable of representing different characters, for example:
US7ASCII character set is the ASCII standard representation of 128 characters. It uses the
low 7 bits of a byte to represent these 128 characters.
The
WE8MSWIN1252 character set is a Western European character set capable of representing
the 128 ASCII characters as well as 128 extended characters, using all 8 bits of a byte.
Before we get into the details of CHAR , VARCHAR2 , and their “ N ” variants, it would benefit us to get a cursory
understanding of what these different character sets mean to us.
The
NLS Overview
As stated earlier, NLS stands for National Language Support . NLS is a very powerful feature of the database, but one
that is often not as well understood as it should be. NLS controls many aspects of our data. For example, it controls
how data is sorted, and whether we see commas and a single period in a number (e.g., 1,000,000.01) or many periods
and a single comma (e.g., 1.000.000,01). But most important, it controls the following:
Encoding of the textual data as stored persistently on disk
Transparent conversion of data from character set to character set
It is this transparent part that confuses people the most—it is so transparent, you cannot even really see it
happening. Let's look at a small example.
Suppose you are storing 8-bit data in a WE8MSWIN1252 character set in your database, but you have some clients
that connect using a 7-bit character set such as US7ASCII . These clients are not expecting 8-bit data and need to
have the data from the database converted into something they can use. While this sounds wonderful, if you are not
aware of it taking place, then you might well find that your data loses characters over time as the characters that are
not available in US7ASCII are translated into some character that is. This is due to the character set translation taking
place. In short, if you retrieve data from the database in character set 1, convert it to character set 2, and then insert
it back (reversing the process), there is a very good chance that you have materially modified the data. Character set
conversion is typically a process that will change the data, and you are usually mapping a large set of characters
(in this example, the set of 8-bit characters) into a smaller set (that of the 7-bit characters). This is a lossy conversion
the characters get modified because it is quite simply not possible to represent every character. But this conversion
must take place. If the database is storing data in a single-byte character set but the client (say, a Java application,
since the Java language uses Unicode) expects it in a multibyte representation, then it must be converted simply so
the client application can work with it.
 
 
Search WWH ::




Custom Search