Chris Wiggins - Data Scientists at Work

Database Reference

In-Depth Information

Here at the data science team at The New York Times, I'm building a group,

and I assure you that I spend as much time thinking hard about the place and

people as I do on things and ideas. Similarly, hackNY is all about mentoring. The

whole point of hackNY is to create a network of very talented young people

who believe in themselves and believe in each other and bring out the best in

themselves and bring out the best in each other. And certainly at Columbia, the

reason I'm still in academia is that I really value the teaching and mentoring and

the quest to better yourself and better your community that you get from an

in-person brick-and-mortar university as opposed to a MOOC.

Gutierrez: What does a typical day at work look like for you?

Wiggins: There are very few typical days right now, though I look forward to

having one in the future. I try to make my days at The New York Times typical

because this is a company. What I mean by that is that it is a place of interde-

pendent people, and so people rely on you. So I try throughout the day to make

sure I meet with everyone in my group in the morning, meet with everyone in

my group in the afternoon, and meet with stakeholders who have either data

issues or who I think have data issues but don't know it yet. Really, at this point, I

would say that at none of my three jobs is there such a thing as a “typical day.”

Gutierrez: Where do you get ideas for things to study or analyze?

Wiggins: Over the past 20 years, I would say the main driver of my ideas has

been seeing people doing it “wrong”. That is, I see people I respect working on

problems that I think are important, and I think they're not answering those

questions the right way. This is particularly true in my early career in machine

learning applied to biology, where I was looking at papers written by statistical

physicists who I respected greatly, but I didn't think that they were using, or

let's say stealing, the appropriate tools for answering the questions they had.

And to me, in the same way that Einstein stole Riemannian geometry from

Riemann and showed that it was the right tool for differential geometry, there

are many problems of interest to theoretical physicists where the right tools

are coming from applied computational statistics, and so they should use those

tools. So a lot of my ideas come from paying attention to communities that

I value, and not being able to brush it off when I see people whom I respect

who I think are not answering a question the right way.

Gutierrez: What specific tools or techniques do you use?

Wiggins: My group here at The New York Times uses only open source sta-

tistical software, so everything is either in R or Python, leaning heavily on

scikit-learn and occasionally IPython notebooks. We rely heavily on Git as

version control. I mostly tend to favor methods of supervised learning rather

than unsupervised learning, because usually when I do an act of clustering,

which is generically what one does as unsupervised learning, I never know if

I've done it the best. I always worry that there is some other clustering that I

could do, and I won't even know which of the two clusterings is the better.

Search WWH ::

Custom Search

Home