Information Technology Reference
In-Depth Information
A number of discussions and blog articles [5-7] suggested that the big
data technologies need to adopt scientific discovery methods that include
iterative model improvement and collection of improved data and reuse of
collected data with an improved model.
According to a blog article by Mike Gualtieri from Forrester [7]: “Firms
increasingly realize that [big data] must use predictive and descriptive ana-
lytics to find nonobvious information to discover value in the data. Advanced
analytics uses advanced statistical, data mining and machine learning algo-
rithms to dig deeper to find patterns that you can't see using traditional BI
tools, simple queries, or rules.”
2.2.2 The Big Data Definition
Despite the fact that the term big data has become a new buzzword, there is
no consistent definition for big data or detailed analysis of this new emerg-
ing technology. Most discussions until now have been in the blogosphere,
where the most significant big data characteristics have been identified and
been commonly accepted [8-10]. In this section, we summarize available
definitions and propose a consolidated view of the generic big data features
that would help us define requirements to support big data infrastructure,
particularly the SDI.
As a starting point, we can refer to a simple definition [9]: “Big Data: a mas-
sive volume of both structured and unstructured data that is so large that
it's difficult to process using traditional database and software techniques.”
A related deinition of the data-intensive science is given in the topic The Fourth
Paradigm: Data-Intensive Scientific Discovery by the computer scientist Jim Gray
[10]: “The techniques and technologies for such data-intensive science are so
different that it is worth distinguishing data-intensive science from compu-
tational science as a new, fourth paradigm for scientific exploration” (p. xix).
2.2.3 Five Vs of Big Data
In a number of discussions and articles, big data are attributed to have such
native generic characteristics as volume, velocity, and variety, also referred
to as the “3 Vs of big data.” After being stored and entered into the process-
ing stages or workflow, big data acquire new properties, value and veracity,
which together constitute the five Vs of big data: volume, velocity, variety,
value, and veracity [4]. Figure 2.1 illustrates the features related to the 5 Vs,
which are analyzed next.
2.2.3.1 Volume
Volume is the most important and distinctive feature of big data that
imposes additional and specific requirements for all traditional technologies
Search WWH ::




Custom Search