Database Reference
In-Depth Information
Much of what we do in physics or mathematical statistics organizes our
worldview around what the appropriate model is. Is this the time when I
should treat it as statistical mechanics and, if so, what terms do I put in my
Hamiltonian? Is it the case that this is a quantum mechanical problem? If so,
what terms do I put in my Hamiltonian? Is this a classical mechanics problem?
If so, what terms should I put in my Hamiltonian?
The world's like that. The world doesn't hand you models. It doesn't come to
you with a model and say, “Diagonalize this Hamiltonian.” 10 It comes to you
with observations and a question usually being asked by the person who gath-
ered those data. So that's the tradition that I thought was important enough
that we make one of our four pillars of data science at Columbia. We want
students to think about how we explore data before we decide that we're
going to model it using some particular distribution or some particular graphi-
cal model. How do you explore a data set that you've been handed?
Gutierrez: What are the most exciting things in data science for you?
Wiggins: The things that are most exciting to me are not new things. The
most exciting thing to me is realizing that something everybody thinks is new
is actually really damn old. That's why I like Tukey so much. There's a lot of
excitement about this new thing called “data science.” I think it's really fun
to go see really old papers in statistics that are even older than Tukey. For
instance, Sewall Wright was using graphical models for genetics in the 1920s. 11
The things that really capture my excitement are not the newfangled things.
It's particularly around the ideas, not so much things, because, again—people,
ideas, and things in that order. The things change. It's fun when we think we
have a new idea, but usually we then realize the idea is actually very old. When
you have an understanding of that, it's a really frickin good idea.
Stochastic optimization and stochastic gradient descent, for example, has been
a huge, huge hit in the last five years, but they descend from a paper written
by Robbins and Monro in 1951. 12 It is a good idea, but the fact that I think it's a
good idea means somebody really thought through it very carefully with pen-
cil on paper a long time back. Trying to understand the world through data
and your computer is a very good idea. That's why Tukey was writing about
it in 1962 when he was ordering everybody to reorient statistics as a profes-
sional discipline and a funding line for the NSF organized around computation
and data and data analysis. He wrote an article in 1962 called “The Future of
Data Analysis.” 13 And he wasn't the last, right?
10 http://vserver1.cscs.lsa.umich.edu/~crshalizi/reviews/fragile-objects/
11 Wright, Sewall. “Correlation and causation.” Journal of Agricultural Research 20.7 (1921), 557-585.
12 Herbert Robbins and Sutton Monro, “A Stochastic Approximation Method”: Ann. Math.
Statist. , Volume 22, Number 3 (1951), 400-407.
13 John W. Tukey, “The Future of Data Analysis”: Ann. Math. Statist. , Volume 33, Number1 (1962), 1-67 .
 
Search WWH ::




Custom Search