Database Reference
In-Depth Information
0.08
0.07
0.06
0.05
0.04
0.03
5
10
15
Number of features
FIGURE 14.14 Prediction accuracy: number of comparison methods vs. average error in
predicting the fraud score.
is about three times lower than when using a single method. Moreover, the additional
methods improve the accuracy of the model but with decreasing gain.
To validate the goodness of fit of the model in Equation 14.5, the adjusted coef-
=−
n
np
1
SS
SS
φφ is the sum
2
err
2
icient of determination, R
1
,
where
SS
=
(
)
err
k
k
tot
of squares of residuals, is also computed. Figure 14.15 shows that as more statistical
tests are used, the adjusted coefficient of determination increases. This demonstrates
that additional features increase the explained variance of the model. When all fea-
tures are used, the model in Equation 14.5 captures over 40% of the total variation in
the data. This result is particularly significant in a large data set that includes a wide
range of patterns of click traffic.
0.5
0.4
0.3
0.2 0
5
Number of features
10
FIGURE 14.15 Prediction accuracy: number of comparison methods vs. R 2 . As the number
of features increases, the adjusted coefficient of determination, R 2 , increases as well, and so
does the explained variance.
Search WWH ::




Custom Search