Information Technology Reference
In-Depth Information
Data Explosion, Data Nature and Dataology
Yangyong Zhu 1 , Ning Zhong 2 , and Yun Xiong 1
1 School of Computer Science, Fudan University
Shanghai 200433, P.R. China
yyzhu@fudan.edu.cn, yunx@fudan.edu.cn
2 Dept. of Life Science and Informatics, Maebashi Institute of Technology
Maebashi-City 371-0816, Japan
zhong@maebashi-it.ac.jp
Abstract. The essence of computer applications is to store things in
the real world into computer systems in the form of data, i.e., it is a
process of producing data. Some data are the records related to culture
and society, and others are the descriptions of phenomena of universe and
life. The large scale of data is rapidly generated and stored in computer
systems, which is called data explosion . Data explosion forms data nature
in computer systems. To explore data nature, new theories and methods
are required. In this paper, we present the concept of data nature and
introduce the problems arising from data nature, and then we define a
new discipline named dataology (also called data science or science of
data ), which is an umbrella of theories, methods and technologies for
studying data nature. The research issues and framework of dataology
are proposed.
1
Introduction
According to the recent IDC research report entitled “As the Economy Con-
tracts, the Digital Universe Expands” [1], the amount of new digital information
reached about 486 billion gigabytes in 2008 and increased 3 percent faster than
IDC previous projection. The digital universe is expected to be double in size
every 18 months. In 2012, five times as much digital information will be gener-
ated versus 2008. When the data are explosively increasing, they also become
more complicated and diversified. At the IBM Information on Demand 2009
conference, experts pointed out that in the world almost 15 GB (gigabytes) data
are produced every day. These data come from various equipments including
sensors, RFIDs, meters, GPSs, etc., and at least 80 percent of new data are
unstructured, such as Web contents, Web logs, email, image, video, audio, and
so on.
All the facts mentioned above indicate that data explosion has happened
and been spreading. In fact, data explosion is the course that data in computer
systems explosively increase since human continuously stores data when using
the computers. During the course of data explosion, the mass data appear multi-
ple natural features including out of control, unknown, diversity and complexity.
Therefore, data explosion forms data nature . Studying data nature is an effective
 
Search WWH ::




Custom Search