A Big Data Platform for High-Performance Deep Analytics: IBM PureData Systems - Harness the Power of Big Data

Database Reference

In-Depth Information

The first generation of data warehouse technologies was modeled after

OLTP-based databases running on large symmetric multiprocessing (SMP)

machines. These machines had inherent architectural limitations that prevented

them from becoming a viable platform for analytics. Subsequent iterations

tried to incorporate parallel processing techniques and distributed storage

subsystems into the architecture. The resulting complexity of operation (vari-

ous kinds of indexes, indexes on indexes, optimization hints, and the like)

made these systems even more complex to operate and expensive to maintain.

Achieving consistent performance against increasing data volumes and

diverse workloads without a significant increase in total cost of ownership

(TCO) has always been the biggest challenge in data warehousing technolo-

gies. Invariably, the biggest bottleneck across all data warehouse operations

was the speed at which the database engine could read from and write data

to disk, known as disk I/O bottleneck .

When warehouses were piece-parts, various providers delivered a num-

ber of I/O innovations in an attempt to address this bottleneck; however,

these innovations were brought to market independently and exhibited little

synergy across warehouse tiers: the relational database management system

(RDBMS), storage subsystem, and server technologies. For example, using

caching in the storage subsystem, faster network fabric on the servers, and

software optimizations, such as partitioning and indexing, are all optimiza-

tions that were brought to market to minimize disk I/O. And although they

addressed the issues locally and that helped a bit, they weren't collectively

optimized to significantly improve disk I/O. In addition, these optimization

techniques relied on the data warehouse designers to second-guess query

patterns and retrieval needs up-front so that they could tune the system for

performance. This not only impacted business agility in meeting new report-

ing and analytics requirements, but it also required significant manual effort

to maintain, tune, and configure the data warehouse tiers. As a result, col-

lectively these systems became expensive to manage and brutal to maintain.

The IBM PureData System for Analytics appliance—formerly known as the

IBM Netezza Data Warehouse Appliance, often just referred to as Netezza—

was developed to overcome these specific challenges. (Since this chapter talks

about the history of the Netezza technology that is the genesis for this IBM

PureData System appliance, we'll refer to both the appliance and technology

simply as Netezza for the remainder of this chapter.) In fact, it's fair to say

that Netezza started what has now become the appliance revolution in data

Search WWH ::

Custom Search

Home