Databases Reference
In-Depth Information
Figure 3-5. On the left is the fitted line. We can see that for any fixed
value, say 5, the values for y vary. For people with 5 new friends, we
display their time spent in the plot on the right.
But it's up to you, the data scientist, whether you think you'd actually
want to use this linear model to describe the relationship or predict
new outcomes. If a new x-value of 5 came in, meaning the user had
five new friends, how confident are you in the output value of -32.08
+ 45.92*5 = 195.7 seconds?
In order to get at this question of confidence, you need to extend your
model. You know there's variation among time spent on the site by
people with five new friends, meaning you certainly wouldn't make
the claim that everyone with five new friends is guaranteed to spend
195.7 seconds on the site. So while you've so far modeled the trend ,
you haven't yet modeled the variation .
Extending beyond least squares
Now that you have a simple linear regression model down (one output,
one predictor) using least squares estimation to estimate your β s, you
can build upon that model in three primary ways, described in the
upcoming sections:
1. Adding in modeling assumptions about the errors
2. Adding in more predictors
3. Transforming the predictors
Adding in modeling assumptions about the errors. If you use your model
to predict y for a given value of x , your prediction is deterministic and
doesn't capture the variablility in the observed data. See on the
 
Search WWH ::




Custom Search