Database Reference
In-Depth Information
features added to it that make it more suitable to this kind of workload.
Nevertheless, it is a general-purpose optimizer and engine with data
warehouse extensions.
Sharing Resources
What do we mean by sharing resources? In computing terms we talk about
the key resources of CPU, I/O, and memory. Sharing resources is a mixed
blessing. SMP data warehouses, for example, often benefit from these
shared resources. Think of a join, for instance. We don't think twice about
this, but the fact that we have a single memory address space means that
we can join data in our SMP data warehouse freely because all the data is
in the same place. However, when we think about scaling data warehouses,
sharing can often lead to one thing: bottlenecks. Imagine having an Xbox
One with only one controller. Only one person gets to have a go while
everyone else has to wait.
When a resource is shared, it means that it potentially has many customers.
When all those customers want to do is read from the resource, this is a
great option. However, if those customers want to write, we hit problems.
Resource access suddenly has to be synchronized to maintain the integrity
of the write.
In database technology such as SQL Server, such synchronizations are
managed. Locks and latches exemplify this concept. Locks deal with the
logical updating of data on a data page. Latches are responsible for
guaranteeing integrity of writes to memory addresses. The important thing
to understand here is that these writes do not happen in isolation. Not
only do they have to be synchronized, but they also have to be serialized.
Everyone has to take his or her turn. This serialization is required because
the resource is shared. So, for example, having a single buffer pool, latch,
and lock manager constrains one's ability to scale.
The hardware also suffers the same challenges. Most SQL Server platforms
deployed in production today suffer with shared storage, for example. When
multiple systems are all accessing the same storage pools via a SAN, then
this introduces multiple bottlenecks. Data warehouses are particularly
sensitive to this type of resource constraint as they need to issue large
sequential scans of data and, therefore, need a very capable I/O subsystem
that can guarantee a consistently good level of performance. A server also
Search WWH ::




Custom Search