Database Reference
In-Depth Information
Gutierrez: Whose work is currently inspiring you?
Tunkelang: Perhaps not what you had in mind, but I'm inspired by Bill Gates.
Specifically, I'm very inspired by his data-driven approach to philanthropy.
Of course, it's humbling to see one of the world's richest people donating almost
his entire net worth to make the world a better place. What is truly inspiring is the
way he's doing it. He's focusing on measurable improvements and optimizing his
philanthropy according to where it does the most measurable good. That makes
him more than just a great human being—it makes him a great data scientist.
Gutierrez: What advice would you give advice to someone starting out?
Tunkelang: It depends where they are coming from. To someone coming
from math or the physical sciences, I'd suggest investing in learning software
skills—especially Hadoop and R, which are the most widely used tools.
Someone coming from software engineering should take a class in machine
learning and work on a project with real data, lots of which is available for
free. As many people have said, the best way to become a data scientist is to
do data science. The data is out there and the science isn't that hard to learn,
especially for someone trained in math, science, or engineering.
Gutierrez: What is something someone starting out should strive to under-
stand deeply?
Tunkelang: Read “The Unreasonable Effectiveness of Data”—a classic essay
by Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira. 3 The
essay is usually summarized as “more data beats better algorithms.” It is worth
reading the whole essay, as it gives a survey of recent successes in using web-
scale data to improve speech recognition and machine translation. Then for
good measure, listen to what Monica Rogati has to say about how better data
beats more data. 4 Understand and internalize these two insights, and you're
well on your way to becoming a data scientist.
Gutierrez: In your opinion, what are the necessary critical thinking and analytic
skills that educational institutions should be teaching?
Tunkelang: No one should graduate from high school without a solid grounding
in the scientific method—basic concepts of hypothesis testing and falsifiabil-
ity. The same should be said for basic knowledge of probability and statistics.
In a world where we're bombarded with data and analyses of data, we should
be informed consumers. And, of course, everyone should learn the basics of
computation—at least enough to demystify the computers that surround us.
3 Alon Halevy, Peter Norvig, and Fernando Pereira, “The Unreasonable Effectiveness of
Data” (March/April 2009, IEEE Intelligent Systems www.computer.org :
www.cs.columbia.edu/igert/courses/E6898/Norvig.pdf ) .
4 Monica Rogati, “The Model and the Train Wreck: A Training Data How-To” (O'Reilly
Strata 2012: www.youtube.com/watch?v=F7iopLnhDik ).
 
Search WWH ::




Custom Search