Information Technology Reference
In-Depth Information
One of the reasons for the poor CA results, when N-grams were used as
features, was that the data were sparse, as a result the facts that the same Arabic
word can occur in different forms and that using word N-grams increases the
number of features, decreasing the frequency of each feature.
4
Conclusion
In this study, we conducted experiments to investigate the effect of word N-grams
as features for ATC. We used the SPA dataset, four FS methods, different
numbers of top-ranked features, three feature RSs, and 3 CALs, as summarized in
Table 2. The data, summarized in Table 3, show that single words as a feature for
ATC produced better results than other word N-grams. The best average CA was
achieved using the SVM CAL; however, the NB CAL provided better CA results
for 2-grams, 3-grams, and 4-grams compared to SVM and KNN, using the same
word N-grams.
The main conclusion of our study is that the use of single words for ATC is
more effective than using N-grams. Further investigation is required in our future
work to validate our results using larger datasets from different domains and
genres, such as newspapers and scientific texts, to reduce data sparseness.
References
1. Alarifi, A., Alghamdi, M., Zarour, M., Aloqail, B., Lraqibah, H., Alsadhan, K.,
Alkwai, L.: Estimating the Size of Arabic Indexed Web Content. Scientific Research
and Essays 7(28), 2472-2483 (2012)
2. Mesleh, A.M.: Feature sub-set selection metrics for Arabic text classification. Pattern
Recognition Letters 32(14), 1922-1929 (2011)
3. Althubaity, A., Almuhareb, A., Alharbi, S., Al-Rajeh, A., Khorsheed, M.: KACST
Arabic Text Classification Project: Overview and Preliminary Results. In: 9th IBMIA
Conference on Information Management in Modern Organizations (2008)
4. Alwedyan, J., Hadi, W.M., Salam, M., Mansour, H.Y.: Categorize Arabic data sets
using multi-class classification based on association rule approach. In: Proceedings of
the 2011 International Conference on Intelligent Semantic Web-Services and
Applications, vol. 18 (2011)
5. Khorsheed, M.S., Al-Thubaity, A.O.: Comparative evaluation of text classification
techniques using a large diverse Arabic dataset. Language Resources and
Evaluation 47(2), 513-538 (2013)
6. Duwairi, R., Al-Refai, M.N., Khasawneh, N.: Feature reduction techniques for Arabic
text categorization. Journal of the American Society for Information Science and
Technology 60(11), 2347-2352 (2009)
7. Noaman, H.M., Elmougy, S., Ghoneim, A., Hamza, T.: Naive Bayes classifier based
Arabic document categorization. In: 7th International Conference on Informatics and
Systems (INFOS 2010), pp. 1-5 (2010)
8. Harrag, F., El-Qawasmah, E., Al-Salman, A.M.S.: Comparing dimension reduction
techniques for Arabic text classification using BPNN algorithm. In: First International
Conference on Integrated Intelligent Computing (ICIIC 2010), pp. 6-11 (2010)
Search WWH ::




Custom Search