Distributed Transactions - Transaction Processing

Information Technology Reference

In-Depth Information

13.1

Distributed Databases

In a centralized database environment all data in the database are stored on the

disks controlled by a single database server, where several server processes and

their threads, usually applying the transaction-shipping (query-shipping) paradigm,

service requests coming from client application processes. These client processes

may run on machines different from the server machine and communicate with the

server-process threads over a network, but, in any case, in this setting, we have only

a single database server and a single database which is shared by all the applications,

and every transaction operates on the data items of that database only.

A distributed database system is a collection of database systems within a single

organization in which applications can run transactions spanning several databases,

so that a transaction can contain read and update actions on data items stored in

different databases. Thus we assume that there are database servers s 0 ;s 1 ;:::;s k

that can communicate between each other in such a way that it is possible to execute

a transaction that, say, reads from a relation r 1 stored at server s 1 and updates a

relation r 2 stored at server s 2 .

The database servers of a distributed database system usually run on different

machines, some of which may be located near to each other and connected via

a fast interconnection network, while some may be located at distant sites on the

Internet. The database at a machine can be accessed only via the server running at

that machine or, in the case of a shared-disks system (to be discussed in Chap. 14 ),

only via the servers belonging to the same cluster. In what follows we assume that

each single database in the distributed database system is managed by a single server

and that each server has the functionality of a transaction server, as explained in

Sect. 1.1 .

A fully integrated distributed database system runs the same database manage-

ment software on every server and is capable of optimizing and executing SQL

queries and update operations that span data items stored at different servers. The

distribution of data across the servers may be transparent to the application, so that

the application programmer need not know exactly at which server a particular data

item resides. Obviously, the system catalog or data dictionary at each server must

then store the schema of the entire distributed database, that is, descriptions of all the

relations at all servers, together with information about the placement of different

relations (or their parts) across different servers.

The distribution of the data of the database does not need to follow the

decomposition of the data into logical units such as relations and tuples. A logical

relation may be partitioned into fragments , either horizontally or vertically.

In horizontal partitioning ,or sharding , a logical relation r is partitioned into

horizontal fragments or shards , that is, disjoint subrelations r 1 ;:::;r k with

r D r 1 [ ::: [ r k

and each fragment r i having the same relation schema as r .

Search WWH ::

Custom Search

Home