Designing a Machine Learning System - Machine Learning with Spark

Database Reference

In-Depth Information

Model monitoring and feedback

It is critically important to monitor the performance of our machine learning system in pro-

duction. Once we deploy our optimal trained model, we wish to understand how it is doing

in the "wild". Is it performing as we expect on new, unseen data? Is its accuracy good

enough? The reality is regardless of how much model selection and tuning we try to do in

the earlier phases; the only way to measure true performance is to observe what happens in

our production system.

Also, bear in mind that model accuracy and predictive performance is only one aspect of a

real-world system. Usually, we are concerned with other metrics related to business per-

formance (for example, revenue and profitability) or user experience (such as the time

spent on our site and how active our users are overall). In most cases, we cannot easily map

model-predictive performance to these business metrics. The accuracy of a recommenda-

tion or targeting system might be important, but it relates only indirectly to the true metrics

we are concerned about, namely whether we are improving user experience, activity, and

ultimately, revenue.

So, in real-world systems, we should monitor both model-accuracy metrics as well as busi-

ness metrics. If possible, we should be able to experiment with different models running in

production to allow us to optimize against these business metrics by making changes to the

models. This is often done using live split tests. However, doing this correctly is not an

easy task, and live testing and experimentation is expensive, in the sense that mistakes,

poor performance, and using baseline models (they provide a control against which we test

out production models) can negatively impact user experience and revenue.

Another important aspect of this phase is model feedback . This is the process where the

predictions of our model feed through into user behavior; this, in turn, feeds through into

our model. In a real-world system, our models are essentially influencing their own future

training data by impacting decision-making and potential user behavior.

For example, if we have deployed a recommendation system, then, by making recommend-

ations, we might be influencing user behavior because we are only allowing users a limited

selection of choices. We hope that this selection is relevant due to our model; however, this

feedback loop, in turn, can influence our model's training data. This, in turn, feeds back in-

to real-world performance. It is possible to get into an ever-narrowing feedback loop; ulti-

mately, this can negatively affect both model accuracy and our important business metrics.

Search WWH ::

Custom Search

Home