Image Processing Reference
Region features alone does not provide good accuracies. With edge-corner-line features only,
we obtain 73% accuracy with decision tree and 75% accuracy with random forest. Combining
the two features improves the overall accuracy.
Classification Correctness Comparison
Region Features Edge-Corner-Line Features All Features
Random forest ( n = 200) 0.63
The best accuracy is obtained with random forest classifier and combined set of features. For
this combination, we obtained accuracy in the range 75-80%. Table 4 shows a sample confusion
matrix using 200 trees for the random forest. The overall accuracy is 78.3%. The sensitivity for
large crystals is 88%.
Confusion Matrix with Random Forest Classifier (Number of Trees = 200)
Other Crystals Needles Small Crystals Large Crystals
Other crystals 28
Small crystals 3
Large crystals 6
Among the four classes, we can observe that the system distinguishes the small crystals and
needle crystals with high accuracy. Distinction between large crystals and other crystals is the
most problematic. From our discussion with the expert, small and large crystals are the most
important crystals in terms of their usability for the diffraction process. Therefore, it is critical
not to misclassify the images in these categories into other two categories. From Table 1 , we
observe that our system misses 6 small crystals (3 images grouped as other crystals and 3 im-
ages grouped as needles). Likewise, our system classifies 6 large crystals as other crystals. In
overall, our system misses 12 [3+3+6] critical images. Thus, the rate of miss of critical crystals
of our system is around 6% [12/212]. This is a promising achievement for crystal sub-classiica-
tion of crystal categories. The average accuracy over 10 runs of 10-fold cross validation is 78%.
As the number of trees for random forest classifier is increased, the accuracy is increased up
to a certain extent. Figure 5 provides the performance comparison for accuracy and compu-
tation time for training and testing versus the number of trees. The best accuracy is obtained
with number of trees sampled is 200. As the number of trees parameter for random forest in-
creases, the computation time also increases. The computation time for training and testing
time increases linearly with the number of trees.