Database Reference
In-Depth Information
5.3.3
Analysis with Different Complexity
The time and space complexity of data analysis algorithms differ greatly from each
other according to different kinds of data and application demands. For example, for
applications that are amenable to parallel processing, a distributed algorithm may be
designed and a parallel processing model may be used for data analysis.
5.4
Tools for Big Data Mining and Analysis
Many tools for big data mining and analysis are available, including professional
and amateur software, expensive commercial software, and free open source
software. In this section, we briefly review the top five widely used software,
according to a survey of “What Analytics, Data mining, Big Data software you used
in the past 12 months for a real project” of 798 professionals made by KDNuggets
in 2012 [ 3 ].
￿
R (30.7 %): R, an open source programming language and software environment,
is designed for data mining/analysis and visualization. While compute-intensive
tasks are executed, code programmed with C, C++, and Fortran may be in under
the R environment. In addition, skilled users may directly call R objects in C. R is
a realization of the S language. S is an interpreted language developed by AT&T
Bell Labs and used for data exploration, statistical analysis, and drawing plots.
Initially, S was mainly implemented in S-PLUS, but S-PLUS is a commercial
software. Compared to S, R is more popular since it is open source. R ranks top
1 in the KDNuggets 2012 survey. Furthermore, in a survey of “Design languages
you have used for data mining/analysis in the past year” in 2012, R was also
in the first place, defeating SQL and Java. Due to the popularity of R, database
manufacturers such as Teradata and Oracle both released products supporting R.
￿
Excel (29.8 %): Excel, a core component of Microsoft Office, provides powerful
data processing and statistical analysis capability, and aids decision making.
When Excel is installed, some advanced plug-ins, such as Analysis ToolPak
and Solver Add-in, with powerful functions for data analysis are also integrated
but such plug-ins can be used only if users enable them. Excel is also the only
commercial software among the top five.
￿
Rapid-I Rapidminer (26.7 %): Rapidminer is an open source software used for
data mining, machine learning, and predictive analysis. In an investigation of
KDnuggets in 2011, it was more frequently used than R (ranked Top 1). Data
mining and machine learning programs provided by RapidMiner include Extract,
Transform and Load (ETL), data pre-processing and visualization, modeling,
evaluation, and deployment. The data mining flow is described in XML and
displayed through a graphic user interface (GUI). RapidMiner is written in
Java. It integrates the learner and evaluation method of Weka, and works with
R. Functions of Rapidminer are implemented with connection of processes of
Search WWH ::




Custom Search