Algorithms - Doing Data Science

Databases Reference

In-Depth Information

What you're facing here, though, is one of the biggest challenges for a

modeler: you never know the truth. It's possible that the true model is

quadratic, but you're assuming linearity or vice versa. You do your best

to evaluate the model as discussed earlier, but you'll never really know

if you're right. More and more data can sometimes help in this regard

as well.

Review

Let's review the assumptions we made when we built and fit our model:

• Linearity

• Error terms normally distributed with mean 0

• Error terms independent of each other

• Error terms have constant variance across values of x

• The predictors we're using are the right predictors

When and why do we perform linear regression? Mostly for two

reasons:

• If we want to predict one variable knowing others

• If we want to explain or understand the relationship between two

or more things

Exercise

To help understand and explore new concepts, you can simulate fake

datasets in R. The advantage of this is that you “play God” because you

actually know the underlying truth, and you get to see how good your

model is at recovering the truth.

Once you've better understood what's going on with your fake dataset,

you can then transfer your understanding to a real one. We'll show

you how to simulate a fake dataset here, then we'll give you some ideas

for how to explore it further:

# Simulating fake data

x_1 <- rnorm ( 1000 , 5 , 7 ) # from a normal distribution simulate

# 1000 values with a mean of 5 and

# standard deviation of 7

hist ( x_1 , col = "grey" ) # plot p(x)

true_error <- rnorm ( 1000 , 0 , 2 )

true_beta_0 <- 1.1

true_beta_1 <- -8.2

Search WWH ::

Custom Search

Home