Database Reference
In-Depth Information
The first generation of data warehouse technologies was modeled after
OLTP-based databases running on large symmetric multiprocessing (SMP)
machines. These machines had inherent architectural limitations that prevented
them from becoming a viable platform for analytics. Subsequent iterations
tried to incorporate parallel processing techniques and distributed storage
subsystems into the architecture. The resulting complexity of operation (vari-
ous kinds of indexes, indexes on indexes, optimization hints, and the like)
made these systems even more complex to operate and expensive to maintain.
Achieving consistent performance against increasing data volumes and
diverse workloads without a significant increase in total cost of ownership
(TCO) has always been the biggest challenge in data warehousing technolo-
gies. Invariably, the biggest bottleneck across all data warehouse operations
was the speed at which the database engine could read from and write data
to disk, known as disk I/O bottleneck .
When warehouses were piece-parts, various providers delivered a num-
ber of I/O innovations in an attempt to address this bottleneck; however,
these innovations were brought to market independently and exhibited little
synergy across warehouse tiers: the relational database management system
(RDBMS), storage subsystem, and server technologies. For example, using
caching in the storage subsystem, faster network fabric on the servers, and
software optimizations, such as partitioning and indexing, are all optimiza-
tions that were brought to market to minimize disk I/O. And although they
addressed the issues locally and that helped a bit, they weren't collectively
optimized to significantly improve disk I/O. In addition, these optimization
techniques relied on the data warehouse designers to second-guess query
patterns and retrieval needs up-front so that they could tune the system for
performance. This not only impacted business agility in meeting new report-
ing and analytics requirements, but it also required significant manual effort
to maintain, tune, and configure the data warehouse tiers. As a result, col-
lectively these systems became expensive to manage and brutal to maintain.
The IBM PureData System for Analytics appliance—formerly known as the
IBM Netezza Data Warehouse Appliance, often just referred to as Netezza—
was developed to overcome these specific challenges. (Since this chapter talks
about the history of the Netezza technology that is the genesis for this IBM
PureData System appliance, we'll refer to both the appliance and technology
simply as Netezza for the remainder of this chapter.) In fact, it's fair to say
that Netezza started what has now become the appliance revolution in data
Search WWH ::




Custom Search