Breaking Dimensions: Adaptive Scoring with Sparse Grids - Realtime Data Mining

Database Reference

In-Depth Information

Fig. 7.3 Spiral data set, sparse grid with levels 5 ( left ) and 7 ( right )

Table 7.2 Leave-one-out cross-validation results for the spiral data set

Level

λ

Training correctness (%)

Testing correctness (%)

4

0.00001

95.31

87.63

5

0.001

94.36

87.11

6

0.00075

100.00

89.69

7

0.00075

100.00

88.14

8

0.0005

100.00

87.63

applications. However, it serves as a hard test case for new data mining algorithms.

It is known that neural networks can have severe problems with this data set and

some neural networks cannot separate the two spirals at all. In Table 7.2 we give the

correctness rate achieved with the leave-one-out cross-validation method, i.e., a

194-fold cross-validation. For the sparse grids, use the tensor-product basis func-

tions as described in this chapter.

The best testing correctness was achieved on level 6 with 89.69 % in comparison

to 77.20 % in [Sin98].

In Fig. 7.3 we show the corresponding results obtained with our sparse grid

combination method for levels 5 and 7. With level 7 the two spirals are clearly

detected and resolved. Note that here 1,281 grid points are contained in the

sparse grid.

■

Example 7.3 This data set Ripley, taken from [Rip94], consists of 250 training data

and 1,000 test points. It is shown in Fig. 6.4a . The data set was generated synthet-

ically and is known to exhibit 8 % error. Thus no better testing correctness than

92 % can be expected. As before, we use tensor-product basis functions.

Since we now have training and test data, we proceed as follows: first, we use

the training set to determine the best regularization parameter

λ

. The best test

correctness rate and the corresponding

λ

are given for different levels n in the

Realtime Data Mining

Search WWH ::

Custom Search

Home