Database Reference
In-Depth Information
Data mining is mainly used to complete the following six different tasks,
with corresponding analytical methods: Classification, Estimation, Prediction,
Affinity grouping or association rules, Clustering, and Description and Visu-
alization. Original data is deemed as the source to form knowledge and data
mining is a process of discovering knowledge from the original data. Original
data may be structured data, e.g., data in relational databases, or semi-structured
data, e.g., text, graphical, and image data, or even heterogeneous data distributed
in the network. Methods to discover knowledge may be mathematical or non-
mathematical, and deductive or inductive. Discovered knowledge may be used
for information management, query optimization, decision support, and process
control, as well as data maintenance.
Mining methods are generally divided into machine learning methods, neural
network methods, and database methods. Machine learning may be next divided
into inductive learning, example-based learning, and genetic algorithms, etc.
Neural network methods may be divided into feedforward neural networks and
self-organizing neural networks, etc. Database methods mainly include multi-
dimensional data analysis or OLAP (On-Line Analytical Processing), as well as
attribute-oriented inductive method.
Various data mining algorithms have been developed, including artificial
intelligence, machine learning, mode identification, statistics and database com-
munity, etc. In 2006, The IEEE International Conference on Data Mining
Series (ICDM) identified ten most influential data mining algorithms through
a strict selection procedure [ 2 ], including C4.5, k-means, SVM, Apriori, EM,
Naive Bayes, and Cart, etc. These ten algorithms cover classification, clustering,
regression, statistical learning, association analysis, and linking mining, all of
which are the most important problems in data mining research. In addition, other
advanced algorithms such as neural networks and genetic algorithms can also be
applied to data mining in different applications. Some prominent applications are
gaming, business, science, engineering, and supervision, etc.
5.2
Big Data Analytic Methods
In the dawn of the big data era, people are concerned with how to rapidly extract key
information from massive data so as to bring values for enterprises and individuals.
At present, the main processing methods of big data are shown as follows.
￿
Bloom Filter : Bloom Filter is actually a bit array and a series of Hash functions.
The principle of Bloom Filter is to store Hash values of data other than data
itself by utilizing a bit array, which is in essence a bitmap index that uses Hash
functions to conduct lossy compression storage of data. It has such advantages as
high space efficiency and high query speed, but also with some disadvantages like
having a certain misrecognition rate and deletion difficulty. Bloom Filter applies
to big data applications that allow a certain misrecognition rate.
Search WWH ::




Custom Search