Database Reference
In-Depth Information
central computer for processing (bringing data to the function), it's the
application that gets moved to all of the locations in which the data is stored
(bringing function to the data). This programming model is known as Hadoop
MapReduce.
Where Elephants Come From:
The History of Hadoop
Now that you've got a high-level sense of what Hadoop is, we can get into
why it's revolutionizing IT. First, let's look at Hadoop's origins.
Hadoop was inspired by Google's work on its Google (distributed) file sys-
tem (GFS) and the MapReduce programming paradigm. IBM has long been
involved with MapReduce, as it teamed up with Google in October 2007 to do
some joint university research on MapReduce and GFS for large-scale Internet
problems. Shortly after Google published papers describing GFS and
MapReduce (in 2003 and 2004 respectively), people in the open source com-
munity (led by Doug Cutting) applied these tools to the open source Nutch
search engine. It became quickly apparent that the distributed file system
and MapReduce modules together had applications beyond just the search
world. It was in early 2006 that these software components became their own
research project, called Hadoop. Hadoop is quite the odd name (and you'll find
a lot of odd names in the Hadoop world). Read any book on Hadoop today and
it pretty much starts with the name that serves as this project's mascot, so let's
start there too. Hadoop is actually the name that creator Doug Cutting's son
gave to his stuffed toy elephant. In thinking up a name for his project, Cutting
was apparently looking for something that was easy to say and stands for nothing
in particular, so the name of his son's toy seemed to make perfect sense.
Much of the work in building Hadoop was done by Yahoo!, and it's no
coincidence that the bulk of the inspiration for Hadoop has come out of the
search engine industry. Hadoop was a necessary innovation for both Yahoo!
and Google, as Internet-scale data processing became increasingly impracti-
cal on large centralized servers. The only alternative was to scale out and
distribute storage and processing on a cluster of thousands of nodes. Yahoo!
reportedly has over 40,000 nodes spanning its Hadoop clusters, which store
over 40PB of data.
The Hadoop open source team took these concepts from Google and made
them applicable to a much wider set of use cases. Unlike transactional systems,
Hadoop is designed to scan through large data sets and to produce its results
 
Search WWH ::




Custom Search