IBM’s Enterprise Hadoop: InfoSphere BigInsights - Harness the Power of Big Data

Database Reference

In-Depth Information

central computer for processing (bringing data to the function), it's the

application that gets moved to all of the locations in which the data is stored

(bringing function to the data). This programming model is known as Hadoop

MapReduce.

Where Elephants Come From:

The History of Hadoop

Now that you've got a high-level sense of what Hadoop is, we can get into

why it's revolutionizing IT. First, let's look at Hadoop's origins.

Hadoop was inspired by Google's work on its Google (distributed) file sys-

tem (GFS) and the MapReduce programming paradigm. IBM has long been

involved with MapReduce, as it teamed up with Google in October 2007 to do

some joint university research on MapReduce and GFS for large-scale Internet

problems. Shortly after Google published papers describing GFS and

MapReduce (in 2003 and 2004 respectively), people in the open source com-

munity (led by Doug Cutting) applied these tools to the open source Nutch

search engine. It became quickly apparent that the distributed file system

and MapReduce modules together had applications beyond just the search

world. It was in early 2006 that these software components became their own

research project, called Hadoop. Hadoop is quite the odd name (and you'll find

a lot of odd names in the Hadoop world). Read any book on Hadoop today and

it pretty much starts with the name that serves as this project's mascot, so let's

start there too. Hadoop is actually the name that creator Doug Cutting's son

gave to his stuffed toy elephant. In thinking up a name for his project, Cutting

was apparently looking for something that was easy to say and stands for nothing

in particular, so the name of his son's toy seemed to make perfect sense.

Much of the work in building Hadoop was done by Yahoo!, and it's no

coincidence that the bulk of the inspiration for Hadoop has come out of the

search engine industry. Hadoop was a necessary innovation for both Yahoo!

and Google, as Internet-scale data processing became increasingly impracti-

cal on large centralized servers. The only alternative was to scale out and

distribute storage and processing on a cluster of thousands of nodes. Yahoo!

reportedly has over 40,000 nodes spanning its Hadoop clusters, which store

over 40PB of data.

The Hadoop open source team took these concepts from Google and made

them applicable to a much wider set of use cases. Unlike transactional systems,

Hadoop is designed to scan through large data sets and to produce its results

Search WWH ::

Custom Search

Home