Big Data Analysis - Big Data: Related Technologies, Challenges and Future Prospects

Database Reference

In-Depth Information

Data mining is mainly used to complete the following six different tasks,

with corresponding analytical methods: Classification, Estimation, Prediction,

Affinity grouping or association rules, Clustering, and Description and Visu-

alization. Original data is deemed as the source to form knowledge and data

mining is a process of discovering knowledge from the original data. Original

data may be structured data, e.g., data in relational databases, or semi-structured

data, e.g., text, graphical, and image data, or even heterogeneous data distributed

in the network. Methods to discover knowledge may be mathematical or non-

mathematical, and deductive or inductive. Discovered knowledge may be used

for information management, query optimization, decision support, and process

control, as well as data maintenance.

Mining methods are generally divided into machine learning methods, neural

network methods, and database methods. Machine learning may be next divided

into inductive learning, example-based learning, and genetic algorithms, etc.

Neural network methods may be divided into feedforward neural networks and

self-organizing neural networks, etc. Database methods mainly include multi-

dimensional data analysis or OLAP (On-Line Analytical Processing), as well as

attribute-oriented inductive method.

Various data mining algorithms have been developed, including artificial

intelligence, machine learning, mode identification, statistics and database com-

munity, etc. In 2006, The IEEE International Conference on Data Mining

Series (ICDM) identified ten most influential data mining algorithms through

a strict selection procedure [ 2 ], including C4.5, k-means, SVM, Apriori, EM,

Naive Bayes, and Cart, etc. These ten algorithms cover classification, clustering,

regression, statistical learning, association analysis, and linking mining, all of

which are the most important problems in data mining research. In addition, other

advanced algorithms such as neural networks and genetic algorithms can also be

applied to data mining in different applications. Some prominent applications are

gaming, business, science, engineering, and supervision, etc.

5.2

Big Data Analytic Methods

In the dawn of the big data era, people are concerned with how to rapidly extract key

information from massive data so as to bring values for enterprises and individuals.

At present, the main processing methods of big data are shown as follows.

Bloom Filter : Bloom Filter is actually a bit array and a series of Hash functions.

The principle of Bloom Filter is to store Hash values of data other than data

itself by utilizing a bit array, which is in essence a bitmap index that uses Hash

functions to conduct lossy compression storage of data. It has such advantages as

high space efficiency and high query speed, but also with some disadvantages like

having a certain misrecognition rate and deletion difficulty. Bloom Filter applies

to big data applications that allow a certain misrecognition rate.

Search WWH ::

Custom Search

Home