Datatypes - Expert Oracle Database Architecture

Database Reference

In-Depth Information

Character and Binary String Types

The character datatypes in Oracle are CHAR , VARCHAR2 , and their “ N ” variants. The CHAR and NCHAR can store up to 2,000

bytes of text. The VARCHAR2 and NVARCHAR2 can store up to 4,000 bytes of information.

■ starting with Oracle 12 c , VARCHAR2 , NVARCHAR2 , and RAW datatypes can be configured to store up to

32,767 bytes of information. extended datatypes are not enabled by default; therefore unless explicitly configured the

maximum size is still 4,000 bytes for VARCHAR2 and NVARCHAR2 datatypes and 2,000 bytes for RAW . see the “extended

Datatypes” section later in this chapter for more details.

Note

This text is converted between various character sets as needed by the database. A character set is a binary

representation of individual characters in bits and bytes. Many different character sets are available, and each is

capable of representing different characters, for example:

US7ASCII character set is the ASCII standard representation of 128 characters. It uses the

low 7 bits of a byte to represent these 128 characters.

•

The

WE8MSWIN1252 character set is a Western European character set capable of representing

the 128 ASCII characters as well as 128 extended characters, using all 8 bits of a byte.

Before we get into the details of CHAR , VARCHAR2 , and their “ N ” variants, it would benefit us to get a cursory

understanding of what these different character sets mean to us.

•

The

NLS Overview

As stated earlier, NLS stands for National Language Support . NLS is a very powerful feature of the database, but one

that is often not as well understood as it should be. NLS controls many aspects of our data. For example, it controls

how data is sorted, and whether we see commas and a single period in a number (e.g., 1,000,000.01) or many periods

and a single comma (e.g., 1.000.000,01). But most important, it controls the following:

•

Encoding of the textual data as stored persistently on disk

•

Transparent conversion of data from character set to character set

It is this transparent part that confuses people the most—it is so transparent, you cannot even really see it

happening. Let's look at a small example.

Suppose you are storing 8-bit data in a WE8MSWIN1252 character set in your database, but you have some clients

that connect using a 7-bit character set such as US7ASCII . These clients are not expecting 8-bit data and need to

have the data from the database converted into something they can use. While this sounds wonderful, if you are not

aware of it taking place, then you might well find that your data loses characters over time as the characters that are

not available in US7ASCII are translated into some character that is. This is due to the character set translation taking

place. In short, if you retrieve data from the database in character set 1, convert it to character set 2, and then insert

it back (reversing the process), there is a very good chance that you have materially modified the data. Character set

conversion is typically a process that will change the data, and you are usually mapping a large set of characters

(in this example, the set of 8-bit characters) into a smaller set (that of the 7-bit characters). This is a lossy conversion —

the characters get modified because it is quite simply not possible to represent every character. But this conversion

must take place. If the database is storing data in a single-byte character set but the client (say, a Java application,

since the Java language uses Unicode) expects it in a multibyte representation, then it must be converted simply so

the client application can work with it.

Expert Oracle Database Architecture

Search WWH ::

Custom Search

Home