Databases Reference
In-Depth Information
Figure 3-4. The line closest to all the points
To find this line, you'll define the “residual sum of squares” (RSS),
denoted
RSS β
,
to be:
2
RSS β
=∑
i
y
i
−
βx
i
where
i
ranges over the various data points. It is the sum of all the
squared vertical distances between the observed points and any given
line. Note this is a function of
β
and you want to optimize with respect
to
β
to find the optimal line.
To minimize
RSS β
=
y
−
βx
t
y
−
βx
,
differentiate it with respect to
β
and set it equal to zero, then solve for
β
.
This results in:
β
=
x
t
x
−1
x
t
y