ELEMENTS USED IN LOGICAL DATA MODELS - A Developer's Guide to Data Modeling for SQL Server

Database Reference

In-Depth Information

Alphanumeric

All data models contain alphanumeric data: any data in a string format,

whether it is alphabetic characters or numbers (as long as they do not par-

ticipate in mathematic operations). For example, names, addresses, and

phone numbers are all string, or alphanumeric, types of data. The actual

data types used for alphanumeric information are char, nchar, varchar, and

nvarchar. As you can probably tell from the names, all these char data

types store character data, such as letters, numbers, and special symbols.

For all these data types, you specify a length. Generally, the length is

the total number of characters that the specified attribute can contain. If

you are creating an attribute to contain abbreviations of U.S. state names,

for example, you might choose to specify that the attribute is a char(2).

This defines the attribute as an alphanumeric field that contains exactly

two characters; char data types store exactly as many characters as they are

defined to hold, no more and no less, no matter how much data is inserted.

You probably noticed that there are four kinds of char data types: two

with a prefix of var , and two with an n prefix (one of which contains both

prefixes). The var prefix means that a variable-length field is being speci-

fied. A variable-length field is defined as a field having no more than the

number of characters specified in the length designation. To contrast char

with varchar, specifying char(10) results in a field that contains ten charac-

ters, even if a specific instance of an entity has six characters in that spe-

cific attribute. The remaining four characters are padded. If the attribute

is defined as a varchar(10), then there will be only six actual characters

stored.

The n prefix specifies that the data is being stored in a Unicode format.

Unicode is an international, platform-agnostic specification for the storage

of character data. Using Unicode allows systems that work with characters

from multiple languages to have a common storage format that can be read

by any other system using the Unicode specification. If you need to store

anything beyond basic ASCII text, you will need to have a Unicode data type.

The primary difference between Unicode and non-Unicode systems is

that Unicode requires two bytes of physical storage for every character

stored; non-Unicode systems generally use only one byte (sometimes more

than one byte is needed when you start storing variable-length data). The

problem with using only one byte for character storage is that one byte

cannot adequately store certain character data, such as Japanese Kanji or

Korean Hangul characters. Obviously, there are storage and performance

trade-offs involved here, and they are covered in more depth in Chapter 3.

Search WWH ::

Custom Search

Home