Database Reference
In-Depth Information
Gutierrez: How are you learning about and keeping up-to-date with the data
science industry?
Foreman: Though I love reading blogs and enjoy engaging online to keep
up to date, for stuff like learning, there is no replacement for topics. In the
data science world, there are lots of great topics coming out. There are just
so many great ones that come with real-world examples and accompanying
data sets. Obviously, there's the Hastie, Tibshirani, Friedman book, Elements of
Statistical Learning. 1 That's kind of the data science bible. Just recently, Khun and
Johnson's new book on predictive modeling, Applied Predictive Modeling, came
out and I've been reading it. 2 It's excellent. So topics are one place I go for all
my learning.
The interesting thing about a lot of data science topics is that they are written
with a specific tool already picked, whether it's R or Python or something else.
If you're going to learn how to really do this stuff, you've got to do it with the
available tools, so the topics make tool decisions. Learning to code examples
out of these topics can help you get your feet wet. But it can also become a
distraction when you're trying to truly understand a technique.
This is, in part, why I wrote my book, Data Smart. 3 I felt like a lot of the examples
in these topics were simply, “Let's load the support vector machine package,
train our model, and then look at the results.” It was like wait, wait, wait, wait,
and wait. You need to explain in detail how that support vector machine just
got built. You can't just build it. That's cheating, which is totally what you do in
a real job—you trust that the packages work—but for learning purposes, you
don't. So I wrote my book to break down all these things in detail and not use
really complex formulas, maybe the way Hastie's book would.
Gutierrez: Where else do you engage with the community?
Foreman: The great thing about data science right now is that there's a very
active, engaged community both in the physical world—at conferences like
O'Reilly's Strata Conference—and online at websites like Cross Validated and
Twitter. In person and online are both great places to have conversations with
other practitioners. Amazingly enough, Twitter is probably the best place to
start conversations about data science, although I find myself often turning to
email to finish them. You can find the experts who know this stuff and then
further that conversation in a longer form.
1 Springer, 2nd ed., 2009.
2 Springer, 2013.
3 Wiley, 2013.
 
Search WWH ::




Custom Search