Automatic classification of protein crystal images - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition - page 358

Image Processing Reference

In-Depth Information

Region features alone does not provide good accuracies. With edge-corner-line features only,

we obtain 73% accuracy with decision tree and 75% accuracy with random forest. Combining

the two features improves the overall accuracy.

Table 3

Classification Correctness Comparison

Region Features Edge-Corner-Line Features All Features

Decision tree

0.56

0.73

0.72

Random forest ( n = 200) 0.63

0.75

0.78

The best accuracy is obtained with random forest classifier and combined set of features. For

this combination, we obtained accuracy in the range 75-80%. Table 4 shows a sample confusion

matrix using 200 trees for the random forest. The overall accuracy is 78.3%. The sensitivity for

large crystals is 88%.

Table 4

Confusion Matrix with Random Forest Classifier (Number of Trees = 200)

Observed Class

Actual Class

Other Crystals Needles Small Crystals Large Crystals

Other crystals 28

6

2

8

Needles

0

42

6

3

Small crystals 3

3

31

6

Large crystals 6

0

3

65

Among the four classes, we can observe that the system distinguishes the small crystals and

needle crystals with high accuracy. Distinction between large crystals and other crystals is the

most problematic. From our discussion with the expert, small and large crystals are the most

important crystals in terms of their usability for the diffraction process. Therefore, it is critical

not to misclassify the images in these categories into other two categories. From Table 1 , we

observe that our system misses 6 small crystals (3 images grouped as other crystals and 3 im-

ages grouped as needles). Likewise, our system classifies 6 large crystals as other crystals. In

overall, our system misses 12 [3+3+6] critical images. Thus, the rate of miss of critical crystals

of our system is around 6% [12/212]. This is a promising achievement for crystal sub-classiica-

tion of crystal categories. The average accuracy over 10 runs of 10-fold cross validation is 78%.

As the number of trees for random forest classifier is increased, the accuracy is increased up

to a certain extent. Figure 5 provides the performance comparison for accuracy and compu-

tation time for training and testing versus the number of trees. The best accuracy is obtained

with number of trees sampled is 200. As the number of trees parameter for random forest in-

creases, the computation time also increases. The computation time for training and testing

time increases linearly with the number of trees.

Next Page

Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Search WWH ::

Custom Search

Home