Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

features added to it that make it more suitable to this kind of workload.

Nevertheless, it is a general-purpose optimizer and engine with data

warehouse extensions.

Sharing Resources

What do we mean by sharing resources? In computing terms we talk about

the key resources of CPU, I/O, and memory. Sharing resources is a mixed

blessing. SMP data warehouses, for example, often benefit from these

shared resources. Think of a join, for instance. We don't think twice about

this, but the fact that we have a single memory address space means that

we can join data in our SMP data warehouse freely because all the data is

in the same place. However, when we think about scaling data warehouses,

sharing can often lead to one thing: bottlenecks. Imagine having an Xbox

One with only one controller. Only one person gets to have a go while

everyone else has to wait.

When a resource is shared, it means that it potentially has many customers.

When all those customers want to do is read from the resource, this is a

great option. However, if those customers want to write, we hit problems.

Resource access suddenly has to be synchronized to maintain the integrity

of the write.

In database technology such as SQL Server, such synchronizations are

managed. Locks and latches exemplify this concept. Locks deal with the

logical updating of data on a data page. Latches are responsible for

guaranteeing integrity of writes to memory addresses. The important thing

to understand here is that these writes do not happen in isolation. Not

only do they have to be synchronized, but they also have to be serialized.

Everyone has to take his or her turn. This serialization is required because

the resource is shared. So, for example, having a single buffer pool, latch,

and lock manager constrains one's ability to scale.

The hardware also suffers the same challenges. Most SQL Server platforms

deployed in production today suffer with shared storage, for example. When

multiple systems are all accessing the same storage pools via a SAN, then

this introduces multiple bottlenecks. Data warehouses are particularly

sensitive to this type of resource constraint as they need to issue large

sequential scans of data and, therefore, need a very capable I/O subsystem

that can guarantee a consistently good level of performance. A server also

Search WWH ::

Custom Search

Home