IBM’s Enterprise Hadoop: InfoSphere BigInsights - Harness the Power of Big Data

Database Reference

In-Depth Information

Running these BigInsights functions gives you an easy way to integrate

with Hadoop from your traditional application framework. With these func-

tions, database applications (which are otherwise Hadoop-unaware) can access

data in a BigInsights cluster using the same SQL interface they use to get

relational data out of them. Such applications can now leverage the parallel-

ism and scale of a BigInsights cluster without requiring extra configuration

or other overhead. Although this approach incurs additional performance

overhead as compared to a conventional Hadoop application, it is a very use-

ful way to integrate Big Data processing into your existing IT application

infrastructure.

The IBM PureData System for Analytics Adapter

BigInsights includes a connector that enables data exchange between a

BigInsights cluster and IBM PureData System for Anlaytics (or its earlier

incarnation, the Netezza appliance). This adapter supports splitting tables

(a concept similar to splitting files). This entails partitioning the table and

assigning each divided portion to a specific mapper. This way, your SQL

statements can be processed in parallel.

The adapter leverages the Netezza technology's external table feature,

which you can think of as a materialized external UNIX pipe. External tables

use JDBC. In this scenario, each mapper acts as a database client. Basically, a

mapper (as a client) will connect to the database and start a read from a UNIX

file that's created by the IBM PureData System's infrastructure.

JDBC Module

The Jaql JDBC module enables you to read and write data from any rela-

tional database that has a standard JDBC driver. This means you can easily

exchange data and issue SQL statements with every major database ware-

house product in the market today.

With Jaql's MapReduce integration, each map task can access a specific

part of a table, enabling SQL statements to be processed in parallel for parti-

tioned databases.

Search WWH ::

Custom Search

Home