Daniel Tunkelang - Data Scientists at Work

Database Reference

In-Depth Information

Gutierrez: Do you code as well as lead the team?

Tunkelang: It's been a while since I've written production code, but I still

look at code and system logs when we're trying to diagnose system behavior.

That said, I do miss writing code. When I find the opportunity to code, like

most recently hacking up some old-school games for my daughter, I remem-

ber how fun it can be to have the instant gratification of making things run. I'm

excited that she'll grow up experiencing that same magic with much better

tools than I had at her age.

However, I've found that, while I'm okay at software engineering, I'm actually

much better at leading software engineers. I've had the opportunity to hire

people who are much better developers than I could ever be, and I'm honored

that I can help them accomplish great things.

Gutierrez: What does your work process look like and how do you view

and measure success?

Tunkelang: Our work proceeds in three stages. The first is hypothesis gen-

eration. We come up with hypotheses either reactively by looking at logs or

proactively by exercising our intuitions. The second is offline analysis. We use

historical data, human judgments, or some other proxy to efficiently test our

hypotheses. Our expectation is that most hypotheses won't survive offline

testing—that's what the null hypothesis is for. It's important that offline testing

be quick and cheap, as that way we only invest in online testing for our most

promising hypotheses. The third stage is online testing, where we implement

product changes to test our hypotheses and bucket-test those changes against

live traffic.

All three of these stages happen in parallel. At any given time, we're engaged

in a mix of hypothesis generation, offline testing, and online testing. Think of it

as managing a portfolio strategy for data-driven innovation.

Gutierrez: How do you manage the portfolio?

Tunkelang: For portfolio management, we try to be scientific about it, but fall

back on intuition when necessary. For example, we'll spend an hour deciding

whether something is worth spending a couple of days investigating. Or a per-

son will spend a week on offline analysis deciding whether something is worth

a couple of months of engineering effort before we can test it online. The basic

principle is fast failure and an exponential increase in effort as we mitigate risk.

It's hard to be completely data-driven about the process, since different

hypotheses apply to different problems. But we adapt. If most of our efforts

are failures, then we're not being sensitive to risk. If all of them are successes,

then we're probably being too risk-averse and leaving big opportunities on the

table. There's no easy way to count hypotheses, since we explore them as many

different levels of granularity—from whether we should change a relevance-

tuning parameter to whether users will be interested in a new product feature.

Search WWH ::

Custom Search

Home