Meet Hadoop - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Figure 1-1. Structure of the topic: there are various pathways through the content

[ 3 ] These statistics were reported in a study entitled “The Digital Universe of Opportunities: Rich Data and

[ 4 ] All figures are from 2013 or 2014. For more information, see Tom Groenfeldt, “At NYSE, The Data De-

orage” ; Ancestry.com's “Company Facts” ; Archive.org's “Petabox” ; and the Worldwide LHC Computing

Grid project's welcome page .

[ 5 ] The quote is from Anand Rajaraman's blog post “More data usually beats better algorithms,” in which he

writes about the Netflix Challenge. Alon Halevy, Peter Norvig, and Fernando Pereira make the same point

in “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems , March/April 2009.

[ 6 ] These specifications are for the Seagate ST-41600n.

[ 7 ] In January 2007, David J. DeWitt and Michael Stonebraker caused a stir by publishing “MapReduce: A

major step backwards,” in which they criticized MapReduce for being a poor substitute for relational data-

bases. Many commentators argued that it was a false comparison (see, for example, Mark C. Chu-Carroll's

“Databases are hammers; MapReduce is a screwdriver” ) , and DeWitt and Stonebraker followed up with

“MapReduce II,” where they addressed the main topics brought up by others.

[ 8 ] Jim Gray was an early advocate of putting the computation near the data. See “Distributed Computing

Economics,” March 2003.

[ 9 ] In January 2008, SETI@home was reported to be processing 300 gigabytes a day, using 320,000 com-

puters (most of which are not dedicated to SETI@home; they are used for other things, too).

[ 10 ] In this topic, we use the lowercase form, “namenode,” to denote the entity when it's being referred to

generally, and the CamelCase form NameNode to denote the Java class that implements it.

[ 11 ] See Mike Cafarella and Doug Cutting, “Building Nutch: Open Source Search,” ACM Queue , April

2004.

[ 12 ] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” October 2003.

[ 13 ] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,”

December 2004.

[ 14 ] “Yahoo! Launches World's Largest Hadoop Production Application,” February 19, 2008.

[ 15 ] Derek Gottfrid, “Self-Service, Prorated Super Computing Fun!” November 1, 2007.

[ 16 ] Owen O'Malley, “TeraByte Sort on Apache Hadoop,” May 2008.

[ 17 ] Grzegorz Czajkowski, “Sorting 1PB with MapReduce,” November 21, 2008.

Search WWH ::

Custom Search

Home