Algorithms - Doing Data Science

Databases Reference

In-Depth Information

Turns out that no matter how the ϵ s are distributed, the least

squares estimates that you already derived are the optimal

estimators for β s because they have the property of being un‐

biased and of being the minimum variance estimators. If you

want to know more about these properties and see a proof

for this, we refer you to any good topic on statistical inference

(for example, Statistical Inference by Casella and Berger).

So what can you do with your observed data to estimate the variance

of the errors? Now that you have the estimated line, you can see how

far away the observed data points are from the line itself, and you can

treat these differences, also known as observed errors or residuals ,as

observations themselves, or estimates of the actual errors, the ϵ s.

Define e i = y i − y i = y i − β 0 + β 1 x i

for i = 1, . . . , n .

Then you estimate the variance ( σ 2 ) of ϵ , as:

∑ i e i 2

n −2

Why are we dividing by n -2? A natural question. Dividing

by n -2, rather than just n , produces an unbiased estimator .

The 2 corresponds to the number of model parameters. Here

again, Casella and Berger's topic is an excellent resource for

more background information.

This is called the mean squared error and captures how much the pre‐

dicted value varies from the observed. Mean squared error is a useful

quantity for any prediction problem. In regression in particular, it's

also an estimator for your variance, but it can't always be used or in‐

terpreted that way. It appears in the evaluation metrics in the following

section.

Evaluation metrics

We asked earlier how confident you would be in these estimates and

in your model. You have a couple values in the output of the R function

that help you get at the issue of how confident you can be in the esti‐

mates: p-values and R-squared. Going back to our model in R, if we

Search WWH ::

Custom Search

Home