Big Data, Data Warehouses, and Business Intelligence Systems - Database Processing: Fundamentals, Design, and Implementation

Database Reference

In-Depth Information

If multiple computers can make updates to a replicated database, then difficult problems

arise. Specifically, if two computers are allowed to process the same row at the same time, they

can cause three types of error: They can make inconsistent changes, one computer can delete

a row that another computer is updating, or the two computers can make changes that violate

uniqueness constraints.

To prevent these problems, some type of record locking is required. Because multiple

computers are involved, standard record locking does not work. Instead, a far more compli-

cated locking scheme, called distributed two-phase locking , must be used. The specifics of

the scheme are beyond the scope of this discussion; for now, just know that implementing this

algorithm is difficult and expensive. If multiple computers can process multiple replications of

a distributed database, then significant problems must be solved.

If the database is partitioned but not replicated [Figure 12-27(b)], then problems will

occur if any transaction updates data that span two or more distributed partitions. For

example, suppose the CUSTOMER and SALESPERSON tables are placed on a partition on one

computer and that INVOICE, LINE_ITEM, and PART tables are placed on a second computer.

Further suppose that when recording a sale all five tables are updated in an atomic transac-

tion. In this case, a transaction must be started on both computers, and it can be allowed to

commit on one computer only if it can be allowed to commit on both computers. In this case,

distributed two-phase locking also must be used.

If the data are partitioned in such a way that no transaction requires data from both

partitions, then regular locking will work. However, in this case, the databases are actually

two separate databases, and some would argue that they should not be considered a distrib-

uted database.

If the data are partitioned in such a way that no transaction updates data from both

partitions but that one or more transactions read data from one partition and update data

on a second partition, then problems might or might not result with regular locking. If dirty

reads are possible, then some form of distributed locking is required; otherwise, regular lock-

ing should work.

If a database is partitioned and at least one of those partitions is replicated, then locking

requirements are a combination of those just described. If the replicated portion is updated, if

transactions span the partitions, or if dirty reads are possible, then distributed two-phase lock-

ing is required; otherwise, regular locking might suffice.

Distributed processing is complicated and can create substantial problems. Except in the

case of replicated, read-only databases, only experienced teams with a substantial budget and

significant time to invest should attempt distributed databases. Such databases also require

data communications expertise. Distributed databases are not for the faint of heart.

Object-Relational Databases

Object-oriented programming (OOP) is a technique for designing and writing computer

programs. Today, most new program development is done using OOP techniques. Java, C++,

C#, and Visual Basic.NET are object-oriented programming languages.

Objects are data structures that have both methods , which are computer programs that

perform some task, and properties , which are data items particular to an object. All objects

of a given class have the same methods, but each has its own set of data items. When using an

OOP, the properties of the object are created and stored in main memory. Storing the values of

properties of an object is called object persistence . Many different techniques have been used

for object persistence. One of them is to use some variation of database technology.

Although relational databases can be used for object persistence, using this method

requires substantial work on the part of the programmer. The problem is that, in general,

object data structures are more complicated than the row of a table. Typically, several, or

even many, rows of several different tables are required to store object data. This means the

OOP programmer must design a mini-database just to store objects. Usually, many objects are

involved in an information system, so many different mini-databases need to be designed and

processed. This method is so undesirable that it is seldom used.

Database Processing: Fundamentals, Design, and Implementation

Search WWH ::

Custom Search

Home