Algorithms - Doing Data Science

Databases Reference

In-Depth Information

Figure 3-5. On the left is the fitted line. We can see that for any fixed

value, say 5, the values for y vary. For people with 5 new friends, we

display their time spent in the plot on the right.

But it's up to you, the data scientist, whether you think you'd actually

want to use this linear model to describe the relationship or predict

new outcomes. If a new x-value of 5 came in, meaning the user had

five new friends, how confident are you in the output value of -32.08

+ 45.92*5 = 195.7 seconds?

In order to get at this question of confidence, you need to extend your

model. You know there's variation among time spent on the site by

people with five new friends, meaning you certainly wouldn't make

the claim that everyone with five new friends is guaranteed to spend

195.7 seconds on the site. So while you've so far modeled the trend ,

you haven't yet modeled the variation .

Extending beyond least squares

Now that you have a simple linear regression model down (one output,

one predictor) using least squares estimation to estimate your β s, you

can build upon that model in three primary ways, described in the

upcoming sections:

1. Adding in modeling assumptions about the errors

2. Adding in more predictors

3. Transforming the predictors

Adding in modeling assumptions about the errors. If you use your model

to predict y for a given value of x , your prediction is deterministic and

doesn't capture the variablility in the observed data. See on the

Search WWH ::

Custom Search

Home