Database Reference
In-Depth Information
Gutierrez: Do you code as well as lead the team?
Tunkelang: It's been a while since I've written production code, but I still
look at code and system logs when we're trying to diagnose system behavior.
That said, I do miss writing code. When I find the opportunity to code, like
most recently hacking up some old-school games for my daughter, I remem-
ber how fun it can be to have the instant gratification of making things run. I'm
excited that she'll grow up experiencing that same magic with much better
tools than I had at her age.
However, I've found that, while I'm okay at software engineering, I'm actually
much better at leading software engineers. I've had the opportunity to hire
people who are much better developers than I could ever be, and I'm honored
that I can help them accomplish great things.
Gutierrez: What does your work process look like and how do you view
and measure success?
Tunkelang: Our work proceeds in three stages. The first is hypothesis gen-
eration. We come up with hypotheses either reactively by looking at logs or
proactively by exercising our intuitions. The second is offline analysis. We use
historical data, human judgments, or some other proxy to efficiently test our
hypotheses. Our expectation is that most hypotheses won't survive offline
testing—that's what the null hypothesis is for. It's important that offline testing
be quick and cheap, as that way we only invest in online testing for our most
promising hypotheses. The third stage is online testing, where we implement
product changes to test our hypotheses and bucket-test those changes against
live traffic.
All three of these stages happen in parallel. At any given time, we're engaged
in a mix of hypothesis generation, offline testing, and online testing. Think of it
as managing a portfolio strategy for data-driven innovation.
Gutierrez: How do you manage the portfolio?
Tunkelang: For portfolio management, we try to be scientific about it, but fall
back on intuition when necessary. For example, we'll spend an hour deciding
whether something is worth spending a couple of days investigating. Or a per-
son will spend a week on offline analysis deciding whether something is worth
a couple of months of engineering effort before we can test it online. The basic
principle is fast failure and an exponential increase in effort as we mitigate risk.
It's hard to be completely data-driven about the process, since different
hypotheses apply to different problems. But we adapt. If most of our efforts
are failures, then we're not being sensitive to risk. If all of them are successes,
then we're probably being too risk-averse and leaving big opportunities on the
table. There's no easy way to count hypotheses, since we explore them as many
different levels of granularity—from whether we should change a relevance-
tuning parameter to whether users will be interested in a new product feature.
 
Search WWH ::




Custom Search