Information Technology Reference
In-Depth Information
13.1
Distributed Databases
In a centralized database environment all data in the database are stored on the
disks controlled by a single database server, where several server processes and
their threads, usually applying the transaction-shipping (query-shipping) paradigm,
service requests coming from client application processes. These client processes
may run on machines different from the server machine and communicate with the
server-process threads over a network, but, in any case, in this setting, we have only
a single database server and a single database which is shared by all the applications,
and every transaction operates on the data items of that database only.
A distributed database system is a collection of database systems within a single
organization in which applications can run transactions spanning several databases,
so that a transaction can contain read and update actions on data items stored in
different databases. Thus we assume that there are database servers s 0 ;s 1 ;:::;s k
that can communicate between each other in such a way that it is possible to execute
a transaction that, say, reads from a relation r 1 stored at server s 1 and updates a
relation r 2 stored at server s 2 .
The database servers of a distributed database system usually run on different
machines, some of which may be located near to each other and connected via
a fast interconnection network, while some may be located at distant sites on the
Internet. The database at a machine can be accessed only via the server running at
that machine or, in the case of a shared-disks system (to be discussed in Chap. 14 ),
only via the servers belonging to the same cluster. In what follows we assume that
each single database in the distributed database system is managed by a single server
and that each server has the functionality of a transaction server, as explained in
Sect. 1.1 .
A fully integrated distributed database system runs the same database manage-
ment software on every server and is capable of optimizing and executing SQL
queries and update operations that span data items stored at different servers. The
distribution of data across the servers may be transparent to the application, so that
the application programmer need not know exactly at which server a particular data
item resides. Obviously, the system catalog or data dictionary at each server must
then store the schema of the entire distributed database, that is, descriptions of all the
relations at all servers, together with information about the placement of different
relations (or their parts) across different servers.
The distribution of the data of the database does not need to follow the
decomposition of the data into logical units such as relations and tuples. A logical
relation may be partitioned into fragments , either horizontally or vertically.
In horizontal partitioning ,or sharding , a logical relation r is partitioned into
horizontal fragments or shards , that is, disjoint subrelations r 1 ;:::;r k with
r D r 1 [ ::: [ r k
and each fragment r i having the same relation schema as r .
Search WWH ::




Custom Search