Database Reference
In-Depth Information
The results were useful in two ways. First, we could provide a link to the home
page on the local business page. Second, we could improve the association
as a signal to web search relevance to better determine when the intent of
the searcher was to find a local business. When web search determines this
intent, it typically shows a map and other information relevant to this class
of search queries. This was a fun machine learning problem, and our accu-
racy not only improved the quality of the local search pages but also helped
Google figure out when web searchers were looking for a local business so
that it could respond with maps and other appropriate content.
When I arrived at Google, there was already a system in place to map busi-
nesses to home pages. It was a machine learning system—specifically, it used
logistic regression to assign scores to candidate home pages for businesses.
I can't disclose numbers, but there was lots of room to improve its precision
and coverage. Moreover, the model was unstable and difficult to interpret,
making it difficult to use for work on incremental improvements to it. So
we decided to explore other approaches that would not only improve our
system's accuracy, but also facilitate ongoing work to improve it.
I can't say too much about our results—the numbers are confidential under
my NDA. But what I can say is that we significantly improved accuracy through
a series of changes that included switching from a logistic regression model to
a decision tree approach. That was surprising, since decision trees are hardly
cutting-edge machine learning models. However, they are very interpretable and
that interpretability made it much easier for us to gain insight and iterate.
Gutierrez: Do you find that non-cutting-edge models sometimes work bet-
ter than newer models as they are applied to new domains?
Tunkelang: I'm not saying that non-cutting-edge models work better—indeed,
I'd like to think that progress in machine learning ensures the opposite! Rather,
it pays to keep things simple when you're trying to understand your data and
iteratively develop models for it. In those cases, it's better to optimize for
interpretability than accuracy. Once you've learned as much as you can, you
can go back to more complex models. When you go back to them, you'll
hopefully now have the right training data, objective function, and features to
take advantage of the latest and greatest machine learning has to offer.
Gutierrez: How important is it to continue working on models that have
already been built?
Tunkelang: There's no preference for replacing versus improving models.
We put most of our efforts into collecting better training data and coming
up with new features. Those usually require us to train new models. There's
some bias towards reusing our existing infrastructure, because that's usually
less work and helps us avoid introducing new bugs. But we do our best to
evaluate models on their own merits, even if that means doing more work to
take advantage of a new approach.
 
Search WWH ::




Custom Search