Database Reference
In-Depth Information
(BDFS) container for data persistence and retrieval. A single data integration
platform, such as the one provided by IIS, gives you both capability and
flexibility. IIS includes a multitude of prebuilt transformation objects and
hundreds of functions, all atop a parallel execution environment that gives
you the flexibility to use a myriad of technologies (including Hadoop) that
are best suited for the task at hand. IIS integrates with HDFS as both a source
and a target system for data delivery. IIS can also model certain integration
tasks within an integration stage and specify the process to be performed on
Hadoop, which would take advantage of Hadoop's MapReduce processing
and low-cost infrastructure. This may be often used in ELT-style integration,
where instead of the T being performed by data warehouse stored proce-
dures, transformations are performed by a Hadoop system. IIS also inte-
grates with InfoSphere Streams (Streams), and it may accumulate insights or
data filtered by Streams into a staged data file, which is then loaded to a
target system (say a data warehouse for further analysis).
High-speed integration into data warehouses is going to be key, and IIS
delivers this capability as well. An example of an IIS Big Data transformation
flow is shown in Figure 10-3. You can see that this job analyzes high-fidelity
emails (stored in Hadoop) for customer sentiment, and the results of that
analysis are used to update the warehouse (for example, the Customer
dimension); this is an example of risk classification based on email analytics.
Other integration technologies that are commonplace in today's IT environ-
ments (and remain key in a Big Data world) include real-time replication and
federation. Real-time replication, utilizing a product such as IBM InfoSphere
Data Replication (Data Replication), involves monitoring a source system and
triggering a replication or change to the target system. This is often used for
Figure 10-3 A data-low job that utilizes a combination of Big Data assets, including
source data in Hadoop joined with DB2 relational data, and various transformations to
classify risk
Search WWH ::




Custom Search