Databases Reference
In-Depth Information
Aside: Digit Recognition
Say you want an algorithm to recognize pictures of hand-written dig‐
its as shown in Figure 4-2 . In this case, k-NN works well.
Figure 4-2. Handwritten digits
To set it up, you take your underlying representation apart pixel by
pixel—say in a 16x16 grid of pixels—and measure how bright each
pixel is. Unwrap the 16x16 grid and put it into a 256-dimensional
space, which has a natural archimedean metric. That is to say, the
distance between two different points on this space is the square root
of the sum of the squares of the differences between their entries. In
other words, it's the length of the vector going from one point to the
other or vice versa. Then you apply the k-NN algorithm.
If you vary the number of neighbors, it changes the shape of the
boundary, and you can tune k to prevent overfitting. If you're careful,
you can get 97% accuracy with a sufficiently large dataset.
Moreover, the result can be viewed in a “confusion matrix.” A confu‐
sion matrix is used when you are trying to classify objects into k bins,
and is a k × k matrix corresponding to actual label versus predicted
label, and the i , j th element of the matrix is a count of the number
of items that were actually labeled i that were predicted to have label
j . From a confusion matrix, you can get accuracy , the proportion of
total predictions that were correct. In the previous chapter, we dis‐
cussed the misclassification rate. Notice that accuracy = 1 - misclas‐
sification rate.
 
Search WWH ::




Custom Search