Daniel Tunkelang - Data Scientists at Work

Database Reference

In-Depth Information

Gutierrez: What were you most proud of for this project?

Tunkelang: I'm most proud of the fact that although there were only three

of us doing most of the work for this project, the changes we made improved

the quality of a huge fraction of web search queries. It may sound cliché,

but it was great to work on something that benefits my mom's day-to-day

experience online.

Gutierrez: How did the three of you come together to work on this

project—was it one of Google's 20% projects?

Tunkelang: This was our day job and not a 20% project. We were assigned to

work as a team, and so we took complete ownership of the project. The previ-

ous developers were available for us to consult them, but mostly they were happy

to work on new projects while we improved on theirs. I truly have the utmost

respect for developers who understand that their products outlive them.

Gutierrez: How was the model for improving local business search built?

Tunkelang: Fortunately, we had a framework in place to compare the per-

formance of different machine learning approaches. We tried a bunch of them,

evaluating their accuracy against a golden set, as well as their efficiency, stabil-

ity, and interpretability. Ultimately, we opted to use decision trees.

We'd expected that switching from regression to decision trees would trade

off accuracy for interpretability and stability. But, to our pleasant surprise, we

were able to improve all three. And it was a lot easier to work on new model

features once we had a decision tree model in place.

Gutierrez: How are decisions made about replacing models that are already

in production?

Tunkelang: At both Google and LinkedIn, we make decisions based on

metrics. If a change affects only one metric—or if it affects several metrics

but all in the same direction—then the decision process is clear: we ship it.

The more interesting case is where some metrics go up and others go down.

In theory, we use a single utility measure to assess overall impact. In practice,

we negotiate whether the tradeoff is net positive to the business. The deci-

sions usually happen at the level of the teams that own the various metrics,

but in exceptional cases the tradeoffs get escalated to someone who can

arbitrate between competing business goals.

Gutierrez: What lessons did you learn from this project?

Tunkelang: I've always valued interpretability, but this project showed me how

crucial it could be in the context of machine learning. I also learned a lot about

the challenges of working with unrepresentative training data. While we had

large volumes of training data, we also had systematic biases that could trick

our machine learning models to overfit for those biases. We had to learn to

compensate for those biases and to distrust anything that looked too clever.

Search WWH ::

Custom Search

Home