Databases Reference
In-Depth Information
21, 2010; idem.; Ben Zimmer, “The Future Tense,” New York Times , February 25, 2011. Anna
North, “New Google Graphs Reveal Centuries of Dicks, Pimps and Hos,” December 17, 2010,
http://jezebel.com/5714665/word-graphs-reveal-centuries-of-dicks-pimps-and-hos (accessed
May 10, 2012); Alexis Madrigal, “Vampire vs. Zombie: Comparing Word Usage through Time,”
December 17, 2010, http://www.theatlantic.com/technology/archive/2010/12/vampire-vs
-zombie-comparing-word-usage-through-time/68203 (accessed May 10, 2012); Dan Klein, “A
Short History of Words: New Google Tool Reveals Relative Popularity of 'Shmuck,' 'Zionist,'
OtherTerms,” December 17, 2010, http://www.tabletmag.com/scroll/53877/a-short-history
-of-words/?utm_source=Tablet+Magazine+List & utm_campaign=fcdbed4176-12_20_2010
&utm_medium=email (accessed May 10, 2012).
18. Michel et al., “ Quantitative Analysis, ” 181 - 182.
19. Ibid., 178 - 179.
20. Geoffrey Nunberg, “ Counting on Google Books, ” Chronicle of Higher Education , December
16, 2010, http://chronicle.com/article/Counting-on-Google-Books/125735 (accessed May
10, 2012); Anthony Grafton, “From the President,” AHA Perspective , March 2010, http://www
.historians.org/Perspectives/issues/2011/1103/1103pre1.cfm (accessed May 10, 2012).
21. Beyond our general critique of the notion of “raw data,” the specifics in this case show
that the data is not raw at all. In order to correct for anomalies in the larger Google Books
corpus, the Ngram Viewer, in fact, operates only on a subset of the larger data, roughly five
million of the fifteen million books digitized by Google by the end of 2010. This, of course, is
not a small number. Though it is only one third of the works in Google books in 2010, Michel
and Aiden estimate that it represents about 4 percent of all books ever published. Michel et al.,
“ Quantitative Analysis, ” 176.
22. Advertisement for Eighteenth Century Collections Online, published in Choice 40, no. 9
(May 2003): 1525; Library and Information Update 2, no. 9 (September 2003): 10.
23. Happily, users have discovered the same thing and have begun successfully pressing Gale to
allow them to have limited access to scanned text and even to correct it. In April 2011, Gale
authorized the Text Creation Partnership at the University of Michigan to manually key and
release 2,231 texts from ECCO: http://www.lib.umich.edu/tcp/ecco/description.html
(accessed May 10, 2012). Related efforts at rescanning and crowdsourced keying and correction
are being organized by 18th Connect: see http://www.18thconnect.org.
24. To control for the variation in the size of the ECCO corpus from decade to decade in the
eighteenth century, I divided the number of hits for “data” by the number of hits for a common
control word. Experiments with several different control words suggested that using “the” as
control produced a stable result.
25. The challenge of interpreting numbers such as these is highlighted by the very different
results produced by a simple word search in Google compared to a parallel search in Google
Search WWH ::




Custom Search