Daniel Tunkelang - Data Scientists at Work

Database Reference

In-Depth Information

of open source tools to support our logging needs. I highly recommend a

piece that Jay Kreps of LinkedIn wrote on the subject, entitled “The Log: What

Every Software Engineer Should Know About Real-Time Data's Unifying

Abstraction.” He published it as a blog post, but it's more like the definitive

book on the subject.

Gutierrez: How does your work evolve through a project's life cycle?

Tunkelang: Early on, our goal is to fail fast. Most crazy ideas are just that:

crazy. So, in the earliest stages, it's important to have efficient ways to reject

bad ideas based on data—for example, to put an upper bound on the impact

of a change by analyzing our logs. But as a hypothesis shows promise through

offline testing, we double down on it. Our focus shifts from trying to kill it

to make it succeed. We optimize parameter settings and then look for edge

cases and related techniques to improve and understand a model. Because

this shift in focus is dramatic, it's important that we only make it for ideas that

survive a harsh validation filter.

Gutierrez: How do you differentiate between crazy and novel ideas?

Tunkelang: It's tough. If someone believes in an idea, we always give that per-

son the opportunity to try to back it up with data. An important question in

this process is how many attempts we allow them to show the model is worth

studying before we kill the idea. At some point, we rely on our judgment to

decide that we've exhausted the space of possibilities. Or we just lose patience.

And sometimes we revive ideas from the morgue when we have new insights.

Gutierrez: How do you keep track of all the ideas in the morgue?

Tunkelang: Frankly, we rely on associative memory. Some of us have ideas that

we never really give up on, so it doesn't take much to trigger them again. And

if new information comes in that offers the ingredients for a compelling case,

it's easy for the original advocate of the idea to justify giving that idea another

chance. We may be data-driven, but our ideas come from a place of passion.

Gutierrez: Where do you get ideas for things to study and analyze?

Tunkelang: To a large extent, I draw on my own intuition and experience.

I encourage my colleagues to do the same. Though we take a rigorous data-driven

approach to experiments, we often rely on our own creativity to figure out which

hypotheses to explore. Of course, sometimes our users make it easier for us by

giving us feedback or by displaying anomalous behavior in our logs.

Gutierrez: How did you go about developing your own intuition?

Tunkelang: My intuition mostly comes from exposure to lots of different

problems. Over time you learn to recognize patterns. Intuition is really a well-

trained association network.

Search WWH ::

Custom Search

Home