Databases Reference
In-Depth Information
What you're facing here, though, is one of the biggest challenges for a
modeler: you never know the truth. It's possible that the true model is
quadratic, but you're assuming linearity or vice versa. You do your best
to evaluate the model as discussed earlier, but you'll never really know
if you're right. More and more data can sometimes help in this regard
as well.
Review
Let's review the assumptions we made when we built and fit our model:
• Linearity
• Error terms normally distributed with mean 0
• Error terms independent of each other
• Error terms have constant variance across values of x
• The predictors we're using are the right predictors
When and why do we perform linear regression? Mostly for two
reasons:
• If we want to predict one variable knowing others
• If we want to explain or understand the relationship between two
or more things
Exercise
To help understand and explore new concepts, you can simulate fake
datasets in R. The advantage of this is that you “play God” because you
actually know the underlying truth, and you get to see how good your
model is at recovering the truth.
Once you've better understood what's going on with your fake dataset,
you can then transfer your understanding to a real one. We'll show
you how to simulate a fake dataset here, then we'll give you some ideas
for how to explore it further:
# Simulating fake data
x_1 <- rnorm ( 1000 , 5 , 7 ) # from a normal distribution simulate
# 1000 values with a mean of 5 and
# standard deviation of 7
hist ( x_1 , col = "grey" ) # plot p(x)
true_error <- rnorm ( 1000 , 0 , 2 )
true_beta_0 <- 1.1
true_beta_1 <- -8.2
Search WWH ::




Custom Search