Database Reference
In-Depth Information
Again, I come from a very old field. Physics is a field where the undergraduate
curriculum was basically canonized by 1926. Years ago I picked up a book at
the Book Scientific bookstore called Compendium of Theoretical Physics. 7 It had
four chapters: classical mechanics, statistical mechanics, quantum mechanics,
and E&M—electricity and magnetism. Those are the four pillars on which
all of physics stands. And physics has a pretty rich intellectual tradition, with
some strong clear wins behind it, but it's really built on those four pillars. You
can see that it has a strong canon. Most fields don't enjoy that. I think you
really need to have a well of a mature field for you to be able to say, “Here are
the four classes that you really need to take as an undergraduate.”
Gutierrez: What does the academic canon at the Institute for Data Sciences
and Engineering at Columbia cover?
Wiggins: I'm on the education committee for the Data Science Institute at
Columbia, so we've created a canon of four classes: Probability and Statistics,
Algorithms for Data Science, Machine Learning for Data Science, and EDAV,
which is short for Exploratory Data Analysis and Visualization. The three let-
ters, EDA , are taken directly from John Tukey.
Tukey had a topic in the 1970s called Exploratory Data Analysis —which was
basically a description of what Tukey did without a computer, probably on the
train between Princeton and Bell Labs, whenever somebody gave him a new
data set. 8 The topic is basically a description of all the ways he would plot out
the data, histograms, Tukey boxplots, Tukey stem-and-flower plots—all these
things that he would do with data. If you read the topic now, it looks like,
“man, this guy was kooky. He should have just opened up R. He should have
just opened up matplotlib.”
Around the same time, he was co-teaching a class at Princeton with Edward
Tufte. If you pick up the topic Visual Display of Quantitative Information, look at
whom it's dedicated to. 9 It's dedicated to Tukey. Again, there's a very old aca-
demic tradition on which many of the data science ideas lie. People have been
thinking in academia for a long time about what the visual display of quantita-
tive information is. How do we meaningfully “do” data visualization? What do
we do when someone hands us data and we just have no distribution? The
world doesn't hand you distributions. It hands you observations.
7 www.wachter-hoeber.com/Books.html?bid=002
8 John W. Tukey, Exploratory Data Analysis (Pearson, 1977) .
9 Edward R. Tufte, The Visual Display of Quantitative Information (2nd ed.)
(Graphics Press, 1983) .
 
Search WWH ::




Custom Search