Database Reference
In-Depth Information
Some developers believe that new technologies, such as Hadoop, can be
used for a multitude of tasks. From a batch and integration perspective,
the Big Data world is characterized by various approaches and disciplines
with Big Data technologies. This leads to a “build mentality,” which assumes
that everything can be built around the new technology. If you think back
to when the data warehouse industry was in its infancy, many IT professionals
attempted to build in-house integration capabilities. Few would do that
today, because mature information integration technologies exist. The same
pattern is playing out in Hadoop, with some believing that it should be the
sole component for integration or transformation workloads.
For example, some folks propose that they should only use Hadoop to
prepare data for a data warehouse; this is generally referred to as ETL. But
there's a huge gap between a general-purpose tool and a purpose-built one,
and integration involves many aspects other than the transformation of
data, such as extraction, discovery, profiling, metadata, data quality, and
delivery. Organizations shouldn't utilize Hadoop solely for integration;
rather they should leverage mature data integration technologies to help
speed their deployments of Big Data. New technologies such as Hadoop
will be adopted into data integration; for example, during an ELT-style inte-
gration (where the T may be performed by stored procedures in a data
warehouse), organizations may look to utilize Hadoop for transformation
processing. We think you'll find the need to use Hadoop engines as part of
an ETL/ELT strategy, but you will also greatly benefit from the flexibility of
a fit-for-purpose transformation engine, massively parallel integration
engine to support multiple transformation and load requirements, integra-
tion into common run-time environments, and a common design palette
that's provided by a product such as InfoSphere Information Server (IIS). In
fact, this product's parallel processing engine and end-to-end integration
and quality capabilities yield a significant total cost of ownership advantage
over alternative approaches.
For example, if the transformation is full of SQL operations, IIS can push
down those operations into an IBM PureData System for Analytics appliance
(formerly known as Netezza). Your integration platform should be able to
not just automatically generate jobs to run on an Hadoop infrastructure or
ETL parallel engine as required, but manage them with a common job
sequencer. IIS includes connectors into Hadoop and a Big Data file stage
Search WWH ::




Custom Search