Database Reference
In-Depth Information
NOTE
I liken the relationship between Hadoop and derivative works to the
world of Xbox games development. Many Xbox games use graphics
engines provided by a third party. The Unreal Engine is just such an
example.
What Is a Distribution?
Now that you know what a derivative work is, we can look at distributions.
A distribution is the packaging of Apache Hadoop projects and subprojects
plus any other additional proprietary components into a single managed
package. For example, Hortonworks provides a distribution of Hadoop
called “Hortonworks Data Platform,” or HDP for short. This is the
distribution used by Microsoft for its product, HDInsight.
You may be asking yourself what is so special about that? You could
certainly do this yourself. However, this would be a significant undertaking.
First, you'd need to download the projects you want, resolve any
dependencies, and then compile all the source code. However, when you
decide to go down this route, all the testing and integration of the various
components is on you to manage and maintain. Bear in mind that the
creatorsofdistributionsalsoemploythecommittersoftheactualsourceand
therefore can also offer support.
As you might expect, distributions may lag slightly behind the Apache
projects in terms of releases. This is one of the deciding factors you might
want to consider when picking a distribution. Frequency of updates is a key
factor, given how quickly the Hadoop ecosystem evolves.
If you look at the Hortonworks distribution, known as Hortonworks Data
Platform (HDP), you can see that there are a number of projects at different
stages of development. The distribution brings these projects together and
tests them for interoperability and stability. Once satisfied that the projects
all hang together, the distributor (in this case, Hortonworks) creates the
versioned release of the integrated software (the distribution as an
installable package).
Search WWH ::




Custom Search