Database Reference
In-Depth Information
specialized functions that enable better access to data in Hadoop's distributed
file system (HDFS), facilitate workflow and the coordination of jobs, support
data movement between Hadoop and other systems, implement scalable ma-
chine learning and data mining algorithms, and so on. These technologies are
all part of the Apache Software Foundation (ASF) and are distributed under a
commercially friendly licensing model.
Apache Hadoop is still in early stages of its evolution. While it does pro-
vide a scalable and reliable solution for Big Data, most enterprises may find
that it has missing features, lacks specific capabilities, or requires specialized
skills to adopt for their needs.
Hence, technology solution providers, including IBM, are making efforts
to bridge the gap and make Apache Hadoop easier for enterprise adoption.
These technology solution providers can take one of two different approaches
to achieving this goal. The first approach is to take the Apache Hadoop open
source code base as a starting point, and then modify it appropriately to
address gaps and limitations. In software development parlance, this process
is known as forking . Vendors adopting this approach effectively create a vendor-
specific proprietary Hadoop distribution that's somewhat closed and insulated
from the innovations and improvements that are being applied to the open
source components by the community. This makes interoperability with oth-
er complementary technologies much more difficult.
The second approach is to retain the open source Apache Hadoop compo-
nents as is, without modifying the code base, while adding other layers and
optional components that augment and enrich the open source distribution.
IBM has taken this second approach with its InfoSphere BigInsights (BigIn-
sights) product, which treats the open source components of Apache Hadoop
as a “kernel” layer, and builds value-added components around it. This enables
IBM to quickly adopt any innovations or changes to the core open source
projects in its distribution. It also makes it easy for IBM to certify third-party
technologies that integrate with open source Apache Hadoop.
This modular strategy of incorporating other open source-based Hadoop
distributions into its own offering enables IBM to both maintain the integrity
of the open source components and to address their limitations. BigInsights
is an IBM-certified version of Apache Hadoop. Moreover, many of the value
components that BigInsights offers (such as BigSheets and the Advanced
Search WWH ::




Custom Search