Database Reference
In-Depth Information
Model monitoring and feedback
It is critically important to monitor the performance of our machine learning system in pro-
duction. Once we deploy our optimal trained model, we wish to understand how it is doing
in the "wild". Is it performing as we expect on new, unseen data? Is its accuracy good
enough? The reality is regardless of how much model selection and tuning we try to do in
the earlier phases; the only way to measure true performance is to observe what happens in
our production system.
Also, bear in mind that model accuracy and predictive performance is only one aspect of a
real-world system. Usually, we are concerned with other metrics related to business per-
formance (for example, revenue and profitability) or user experience (such as the time
spent on our site and how active our users are overall). In most cases, we cannot easily map
model-predictive performance to these business metrics. The accuracy of a recommenda-
tion or targeting system might be important, but it relates only indirectly to the true metrics
we are concerned about, namely whether we are improving user experience, activity, and
ultimately, revenue.
So, in real-world systems, we should monitor both model-accuracy metrics as well as busi-
ness metrics. If possible, we should be able to experiment with different models running in
production to allow us to optimize against these business metrics by making changes to the
models. This is often done using live split tests. However, doing this correctly is not an
easy task, and live testing and experimentation is expensive, in the sense that mistakes,
poor performance, and using baseline models (they provide a control against which we test
out production models) can negatively impact user experience and revenue.
Another important aspect of this phase is model feedback . This is the process where the
predictions of our model feed through into user behavior; this, in turn, feeds through into
our model. In a real-world system, our models are essentially influencing their own future
training data by impacting decision-making and potential user behavior.
For example, if we have deployed a recommendation system, then, by making recommend-
ations, we might be influencing user behavior because we are only allowing users a limited
selection of choices. We hope that this selection is relevant due to our model; however, this
feedback loop, in turn, can influence our model's training data. This, in turn, feeds back in-
to real-world performance. It is possible to get into an ever-narrowing feedback loop; ulti-
mately, this can negatively affect both model accuracy and our important business metrics.
Search WWH ::




Custom Search