Database Reference
In-Depth Information
developing quality rules. Blueprint Director may also be used for mapping
out Big Data integration architectures, and can help you actively manage your
Big Data integration architecture.
IBM InfoSphere Information Server
There are three styles of data integration: bulk (or batch movement), real-
time, and federation. Specific projects, such as Big Data analytics, often
require a combination of all of these styles to satisfy varying requirements.
Information integration is a key requirement of a Big Data platform,
because it enables you to leverage economies of scale from existing invest-
ments, yet discovers new economies of scale as you expand the analytics
paradigm. For example, consider the case where you have a heavy SQL
investment in a next best offer (NBO) application. If this application were
SQL warehouse-based, it could be enhanced with the ability to call a Hadoop
job that would look at the trending sentiment associated with a feature item
stock-out condition. This could help to determine the acceptability of such an
offer before it's made. Having the ability to leverage familiar SQL to call a
function that spawns a MapReduce job in a Hadoop cluster to perform this
sentiment analysis is not only a powerful concept; it's crucial from an invest-
ment enhance perspective.
Perhaps it's the case that you have a machine data analysis job running on
a Hadoop cluster and want to draw customer information from a system that
manages rebates, hoping to find a strong correlation between certain log
events and eventual credits. This goes back to the baseball analogy we talked
about in Chapter 1. The Big Data era is characterized by various fit-for-
purpose engines, and it's the coordination of these engines (like the baseball
player who is better at throwing with one hand and catching with the other)
that's key: information integration makes this all happen.
Information is often structured or semistructured, and to achieve high-
volume throughput requires a powerful processing engine. Your Big Data
integration platform should provide balanced optimization for different inte-
gration and transformation needs, ranging from ETL (extract, transform, and
load), to ELT (leveraging the target system to process the transformations
while providing the transformation logic), to TELT (transform, extract, load,
and transform).
 
Search WWH ::




Custom Search