Database Reference
In-Depth Information
Increasingly we're seeing Hadoop being leveraged as a dynamic ETL engine,
especially for unstructured data. ETL is not just about the transformation,
though. Orchestration, debugging, lifecycle management, and data lineage are
but a few important considerations. DataStage, and the rest of the InfoSphere
Information Server platform provide means to deal with all these items and
more. Plans are in place for even tighter connections between Information Server
and BigInsights, such as the ability to choreograph BigInsights jobs from
DataStage, making powerful and flexible data transformation scenarios possi-
ble. We talk about Hadoop optimizations, including DataStage's Big Data File
Stage (BDFS), in Chapter 10 and Chapter 11.
Operational Excellence
As more organizations come to depend on Hadoop for their business analyt-
ics, the demand for additional governance and administrative capabilities
increases. People have grown accustomed to the rich data management fea-
tures in enterprise relational databases and want to see them in their Hadoop
clusters as well. At the same time, Hadoop is entering an era in which the
understanding of data governance is highly evolved. Relational databases
had the advantage of “growing up” with a lot of the thinking about data. As
a result, many experienced IT people look at Hadoop with high expectations.
The trouble is that open source software focuses more on core capability than
on rounding out a lot of administrative features. In IBM, there are hundreds
of researchers and developers who are industry-leading experts on gover-
nance, workload management, and performance optimization. Since the
Apache Hadoop project took off, many of these experts have been develop-
ing Hadoop solutions that have been incorporated into BigInsights.
Securing the Cluster
Security is an important concern for enterprise software, and in the case of
open source Hadoop, there are limitations to consider before moving to pro-
duction. The good news is that BigInsights addresses these issues by reducing
the security surface area through securing access to the administrative inter-
faces, key Hadoop services, lockdown of open ports, role-based security,
integration into InfoSphere Guardium Database Security (Guardium), and more.
 
Search WWH ::




Custom Search