Architecture Components - Big Data Analytics: Disruptive Technologies for Changing the Game

Databases Reference

In-Depth Information

• Requires no hand coding of programs to enable more processors

• Supports SMP, clustered, grid, and MPP platforms

In InfoSphere Streams, continuous applications are composed of individual

operators, which interconnect and operate on one or more data streams. Data

streams normally come from outside the system or can be produced internally as

part of an application. The operators may be used on the data to have it iltered,

classiied, transformed, correlated, and/or fused to make decisions using business

rules. Depending on the need, the streams can be subdivided and processed by a

large number of nodes, thereby reducing the latency and improving the process-

ing volumes.

The Netezza Performance Server (NPS ® ) system's architecture is a

two-tiered system designed to handle very large queries from multiple users.

The irst tier is a high-performance Linux ® symmetric multiprocessing host.

The host compiles queries received from Business Intelligence applications

and generates query execution plans. It then divides a query into a sequence of

subtasks, or snippets, which can be executed in parallel, and it distributes the

snippets to the second tier for execution. The host returns the inal results to

the requesting application, thus providing the programming advantages while

appearing to be a traditional database server. The second tier consists of dozens

to hundreds to thousands of Snippet Processing Units (SPUs) operating in paral-

lel. Each SPU is an intelligent query processing and storage node and consists

of a powerful commodity processor, dedicated memory, disk drive, and ield-

programmable disk controller with hard-wired logic to manage data lows and

process queries at the disk level.

The massively parallel, shared-nothing SPU blades provide the performance

advantages of massively parallel processors. Nearly all query processing is done

at the SPU level, with each SPU operating on its portion of the database. All

operations that easily lend themselves to parallel processing (including record

operations, parsing, iltering, projecting, interlocking, and logging) are performed

by the SPU nodes, which signiicantly reduces the amount of data moved within

the system. Operations on sets of intermediate results, such as sorts, joins, and

aggregates, are executed primarily on the SPUs but can also be done on the host,

depending on the processing cost of that operation.

A recent development in the scalability for databases is evident from IBM's

pureScale offering. Designed for organizations that run online transaction

processing (OLTP) applications on distributed systems, IBM ® DB2 ® pureScale ®

offers clustering technology that helps deliver high availability and exceptional

Search WWH ::

Custom Search

Home