Database Reference
In-Depth Information
In January 2008, Hadoop was made its own top-level project at Apache, confirming its
success and its diverse, active community. By this time, Hadoop was being used by many
other companies besides Yahoo!, such as Last.fm, Facebook, and the New York Times .
In one well-publicized feat, the New York Times used Amazon's EC2 compute cloud to
crunch through 4 terabytes of scanned archives from the paper, converting them to PDFs
for the Web. [ 15 ] The processing took less than 24 hours to run using 100 machines, and the
project probably wouldn't have been embarked upon without the combination of
Amazon's pay-by-the-hour model (which allowed the NYT to access a large number of
machines for a short period) and Hadoop's easy-to-use parallel programming model.
In April 2008, Hadoop broke a world record to become the fastest system to sort an entire
terabyte of data. Running on a 910-node cluster, Hadoop sorted 1 terabyte in 209 seconds
(just under 3.5 minutes), beating the previous year's winner of 297 seconds. [ 16 ] In Novem-
ber of the same year, Google reported that its MapReduce implementation sorted 1 tera-
byte in 68 seconds. [ 17 ] Then, in April 2009, it was announced that a team at Yahoo! had
used Hadoop to sort 1 terabyte in 62 seconds. [ 18 ]
The trend since then has been to sort even larger volumes of data at ever faster rates. In
the 2014 competition, a team from Databricks were joint winners of the Gray Sort bench-
mark. They used a 207-node Spark cluster to sort 100 terabytes of data in 1,406 seconds, a
rate of 4.27 terabytes per minute. [ 19 ]
Today, Hadoop is widely used in mainstream enterprises. Hadoop's role as a general-pur-
pose storage and analysis platform for big data has been recognized by the industry, and
this fact is reflected in the number of products that use or incorporate Hadoop in some
way. Commercial Hadoop support is available from large, established enterprise vendors,
including EMC, IBM, Microsoft, and Oracle, as well as from specialist Hadoop compan-
ies such as Cloudera, Hortonworks, and MapR.
Search WWH ::




Custom Search