Algorithms - Doing Data Science

Databases Reference

In-Depth Information

righthand side of Figure 3-5 that for a fixed value of x = 5 , there is

variability among the time spent on the site. You want to capture this

variability in your model, so you extend your model to:

y = β 0 + β 1 x +ϵ

where the new term ϵ is referred to as noise , which is the stuff that you

haven't accounted for by the relationships you've figured out so far. It's

also called the error term — ϵ represents the actual error , the difference

between the observations and the true regression line, which you'll

never know and can only estimate with your β s.

One often makes the modeling assumption that the noise is normally

distributed, which is denoted:

ϵ∼ N 0, σ 2

Note this is sometimes not a reasonable assumption. If you

are dealing with a known fat-tailed distribution, and if your

linear model is picking up only a small part of the value of

the variable y, then the error terms are likely also fat-tailed.

This is the most common situation in financial modeling.

That's not to say we don't use linear regression in finance,

though. We just don't attach the “noise is normal” assumption

to it.

With the preceding assumption on the distribution of noise, this mod‐

el is saying that, for any given value of x , the conditional distribution

of y given x is p y x ∼ N β 0 + β 1 x , σ 2 .

So, for example, among the set of people who had five new friends this

week, the amount of the time they spent on the website had a normal

distribution with a mean of β 0 + β 1 * 5 and a variance of σ 2 , and you're

going to estimate your parameters β 0 , β 1 , σ from the data.

How do you fit this model? How do you get the parameters β 0 , β 1 , σ

from the data?

Search WWH ::

Custom Search

Home