Database Reference
In-Depth Information
only ones. LinkedIn, Facebook, and of course Yahoo! have all contributed to
the big data mind share.
There are similarities here to the SIGMOD papers published by various
parties in the relational database world, but ultimately it isn't the same.
Let's look at an example. Twitter has open-sourced Storm—their complex
event processing engine—which has recently been accepted into the Apache
incubator program. For relational database vendors, this level of open
sharing is really quite unheard of. For more details about storm head over to
Apache: http://incubator.apache.org/projects/storm.html .
Nutch
Nutch was an open source crawler-based search engine built by a handful
of part-time developers, including Doug Cutting. As previously mentioned
Cutting was inspired by the Google publications and changed Nutch to
take advantage of the enhanced scalability of the architecture promoted by
Google.However, itwasn't toolongafter thisthatCutting joined Yahoo!and
Hadoop was born.
Nutch joined the Apache foundation in January 2005, and its first release
(0.7) was in August 2005. However, it was not until 0.8 was released in July
2006 that Nutch began the transition to Hadoop-based architecture.
Nutch is still very much alive and is an actively contributed-to project.
However, Nutch has now been split into two codebases. Version 1 is the
legacy and provides the origins of Hadoop. Version 2 represents something
of a re-architecture of the original implementation while still holding true to
the original goals of the project.
What Is Hadoop?
Apache Hadoop is a top-level open source project and is governed by the
Apache Software Foundation (ASF). Hadoop is not any one entity or thing.
Itisbestthoughtofasaplatformoranecosystemthatdescribesamethodof
distributed data processing at scale using commodity hardware configured
to run as a cluster of computing power. This architecture enables Hadoop to
address and analyze vast quantities of data at significantly lower cost than
traditional methods commonly found in data warehousing, for example,
with relational database systems.
Search WWH ::




Custom Search