Database Reference
In-Depth Information
in use. The existing list may not cover all types of potential fraud and may
need to be appended to the results of random audits. In conclusion, both the
supervised and unsupervised approaches for detecting fraud have pros and
cons. A combined approach is the one that usually yields the best results.
MACHINE LEARNING/ARTIFICIAL INTELLIGENCE
VS. STATISTICAL TECHNIQUES
According to their origin and the way they analyze data patterns, the data mining
models can be grouped into two classes:
• Machine learning/artificial intelligence models
• Statistical models.
Statistical models include algorithms like OLSR, logistic regression, factor
analysis/PCA, among others. Techniques like decision trees, neural networks,
association rules, self-organizing maps are machine learning models.
With the rapid developments in IT in recent years, there has been a rapid
growth in machine leaning algorithms, expanding analytical capabilities in terms
of both efficiency and scalability. Nevertheless, one should never underestimate
the predictive power of ''traditional'' statistical techniques whose robustness and
reliability have been established and proven over the years.
Faced with the growing volume of stored data, analysts started to look for
faster algorithms that could overcome potential time and size limitations. Machine
learning models were developed as an answer to the need to analyze large amounts
of data in a reasonable time. New algorithms were also developed to overcome
certain assumptions and restrictions of statistical models and to provide solutions
to newly arisen business problems like the need to analyze affinities through
association modeling and sequences through sequence models.
Trying many different modeling solutions is the essence of data mining. There
is no particular technique or class of techniques which yields superior results
in all situations and for all types of problems. However, in general, machine
learning algorithms perform better than traditional statistical techniques in regard
to speed and capacity of analyzing large volumes of data. Some traditional statistical
techniques may fail to efficiently handle wide (high-dimensional) or long datasets
(many records). For instance, in the case of a classification project, a logistic
regression model would demand more resources and processing time than a
Search WWH ::




Custom Search