Database Reference
In-Depth Information
Recently, the rapid growth of big data mainly comes from people's daily life,
especially related to the service of Internet companies. For example, Google pro-
cesses data of hundreds of PB and Facebook generates log data of over 10 Petabyte
(PB) per month; Baidu, a Chinese company, processes data of tens of PB and
Taobao, a subsidiary of Alibaba, generates data of tens of Terabyte (TB) on online
trading per day. While the amount of large datasets is drastically rising, it also
brings about many challenging problems demanding prompt solutions. First, the
latest advances of information technology (IT) make it more easily to generate
data. For example, on average, 72 h of videos are uploaded to YouTube in every
minute [ 13 ]. Therefore, we are confronted with the main challenge of collecting and
integrating massive data from widely distributed data sources. Second, the collected
data is increasingly growing, which causes a problem of how to store and manage
such huge, heterogeneous datasets with moderate requirements on hardware and
software infrastructure. Third, in consideration of the heterogeneity, scalability, real-
time, complexity, and privacy of big data, we shall effectively “mine” the datasets
at different levels with analysis, modeling, visualization, forecast, and optimization
techniques, so as to reveal its intrinsic property and improve decision making.
The rapid growth of cloud computing and the Internet of Things (IoT) further
promote the sharp growth of data. Cloud computing provides safeguarding, access
sites, and channels for data asset. In the paradigm of IoT, sensors all over the
world are collecting and transmitting data which will be stored and processed in
the cloud. Such data in both quantity and mutual relations will far surpass the
capacities of the IT architectures and infrastructure of existing enterprises, and its
realtime requirement will greatly stress the available computing capacity. Figure 1.1
illustrates the boom of the global data volume.
1.2
Definition and Features of Big Data
Big data is an abstract concept. Apart from masses of data, it also has some other
features, which determine the difference between itself and “massive data” or “very
big data.” At present, although the importance of big data has been generally
recognized, people still have different opinions on its definition. In general, big
data refers to the datasets that could not be perceived, acquired, managed, and
processed by traditional IT and software/hardware tools within a tolerable time.
Because of different concerns, scientific and technological enterprises, research
scholars, data analysts, and technical practitioners have different definitions of big
data. The following definitions may help us have a better understanding on the
profound social, economic, and technological connotations of big data.
In 2010, Apache Hadoop defined big data as “datasets which could not be
captured, managed, and processed by general computers within an acceptable
scope.” On the basis of this definition, in May 2011, McKinsey & Company, a
global consulting agency announced Big Data as “the Next Frontier for Innovation,
Competition, and Productivity.” Big data shall mean such datasets which could
Search WWH ::




Custom Search