Database Concepts - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

answer the query are not in the buffer, they are read from the disk, probably

replacing existing ones in the buffer (if it is full, which is normally the case)

using well-known algorithms, for example, replacing the least recently used

pages with the new ones. In this way, the buffer acts as a cache that the

DBMS can access to avoid going to disk, enhancing query performance.

File organization is the physical arrangement of data in a file into

records and blocks on secondary storage. There are three main types of file

organization. In a heap (or unordered ) file organization, records are placed

in the file in the order in which they are inserted. This makes insertion very

ecient. However, retrieval is relatively slow, since the various pages of the

file must be read in sequence until the required record is found. Sequential

(or ordered ) files have their records sorted on the values of one or more

fields, called ordering fields . Ordered files allow fast retrieving of records,

provided that the search condition is based on the sorting attribute. However,

inserting and deleting records in a sequential file are problematic, since the

order must be maintained. Finally, hash files use a hash function that

calculates the address of the block (or bucket ) in which a record is to be

stored, based on the value of one or more attributes. Within a bucket, records

are placed in order of arrival. A collision occurs when a bucket is filled to

its capacity and a new record must be inserted into that bucket. Hashing

provides the fastest possible access for retrieving an arbitrary record given

the value of its hash field. However, collision management degrades the overall

performance.

Independently of the particular file organization, additional access struc-

tures called indexes are used to speed up the retrieval of records in response

to search conditions. Indexes provide ecient ways to access the records based

on the indexing fields that are used to construct the index. Any field(s) of

the file can be used to create an index, and multiple indexes on different fields

can be constructed in the same file.

There are many different types of indexes. We describe below some

categories of indexes according to various criteria:

One categorization of indexes distinguishes between clustered and non-

clustered indexes , also called primary and secondary indexes .Ina

clustered index, the records in the data file are physically ordered according

to the field(s) on which the index is defined. This is not the case for a

nonclustered index. A file can have at most one clustered index and in

addition can have several nonclustered indexes.

Indexes can be single-column or multiple-column , depending on the

number of indexing fields on which they are based. When a multiple-

column index is created, the order of columns in the index has an impact

on data retrieval. Generally, the most restrictive value should be placed

first for optimum performance.

Another categorization of indexes is according to whether they are unique

or nonunique : unique indexes do not allow duplicate values, while this is

not the case for nonunique indexes.

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home