An Overview of the NoSQL World - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

across multiple tables, and supports aggregation and full-text queries. It also sup-

ports Transact-SQL (T-SQL), native ODBC, and ADO.NET data access.* In par-

ticular, the SQL Azure service can be seen as running an instance of SQL server

in a cloud hosted server, which is automatically managed by Microsoft instead of

running on-premise managed server.

In SQL Azure, a logical database is called a table group , which can be keyless or

keyed. A keyless table group is an ordinary SQL server database where there are no

restrictions on the choices of keys for the tables. On the other hand, if a table group is

keyed, then all of its tables must have a common column called the partitioning key ,

which does not need not to be a unique key for each relation. A row group is a set of

all rows in a table group that have the same partition key value. SQL Azure requires

that each transaction executes on one table group. If the table group is keyed, then

the transaction can read and write rows of only one row group. Based on these prin-

ciples, there are two options for building transaction application that can scale out

using SQL Azure. The first option is to store the data in multiple groups where each

table group can fit comfortably on a single machine. In this scenario, the application

takes the responsibility for scaling out by partitioning the data into separate table

groups. The second option is to design the database as keyed table group so that the

SQL Azure can perform the scale out process automatically.

In SQL Azure, the consistency unit of an object is the set of data that can be

read and written by an ACID transaction. Therefore, the consistency unit of a

keyed table group is the row group while the consistency unit of a keyless table

group is the whole table group. Each replica of a consistency unit is always fully

contained in a single instance of SQL server running one machine. Hence, using

the two-phase commit protocol is never required. A query can execute on multiple

partitions of a keyed table group with an isolation level of read-committed. Thus,

data that the query read from different partitions may reflect the execution of dif-

ferent transactions. Transactionally consistent reads beyond a consistency unit are

not supported.

At the physical level, a keyed table group is split into partitions based on ranges

of its partitioning key. The ranges must cover all values of the partitioning key and

must not overlap. This ensures that each row group resides in exactly one partition

and hence that each row of a table has a well-defined home partition. Partitions

are replicated for high availability. Therefore, a partition is considered to be the

failover unit. Each replica is stored on one server. Each row group is wholly con-

tained in one replica of each partition that is scattered across servers such that no

two copies reside in the same failure domain . The transaction commitment protocol

requires that only a quorum of the replicas be up. A Paxos-like consensus algorithm

is used to maintain a set of replicas to deal with replica failures and recoveries.

Dynamic quorums are used to improve availability in the face of multiple failures.

In particular, for each partition, at each point in time one replica is designated to be

the primary. A transaction executes using the primary replica of the partition that

contains its row group and thus is nondistributed. The primary replica processes all

queries, updates, and data definition language operations. The primary replica is also

* http://msdn.microsoft.com/en-us/library/h43ks021(VS.71).aspx.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home