Database Reference
In-Depth Information
In 2008, Microsoft purchased Farecast, a sci-tech venture company in the
U.S. Forecast has an airline ticket forecasting system that predicts the trends and
rising/dropping ranges of airline ticket prices. The system has been incorporated
into the Bing search engine of Microsoft. By 2012, the system has saved nearly
USD 50 per ticket per passenger, with the forecast accuracy as high as 75 %.
At present, data has become an important production factor that could be
comparable to material assets and human capital. As multimedia, social media,
and IoT are fast evolving, enterprises will collect more information, leading to
an exponential growth of data volume. Big data will have a huge and increasing
potential in creating values for businesses and consumers.
1.4
The Development of Big Data
In late 1970s, the concept of “database machine” emerged, which is a technology
specially used for storing and analyzing data. With the increase of data volume, the
storage and processing capacity of a single mainframe computer system has become
inadequate. In the 1980s, people proposed “share nothing,” a parallel database
system, to meet the demand of the increasing data volume [ 19 ]. The share nothing
system architecture is based on the use of cluster and every machine has its own
processor, storage, and disk. Teradata system was the first successful commercial
parallel database system. Such database became very popular lately. On June 2,
1986, a milestone event occurred, when Teradata delivered the first parallel database
system with a storage capacity of 1TB to Kmart to help the large-scale retail
company in North America to expand its data warehouse [ 20 ]. In late 1990s, the
advantages of the parallel database was widely recognized in the database field.
However, many challenges on big data arose. With the development of Internet
services, indexes and queried contents were rapidly growing. Therefore, search
engine companies had to face the challenges of handling such big data. Google
created GFS [ 21 ] and MapReduce [ 22 ] programming models to cope with the
challenges brought about by data management and analysis at the Internet scale.
In addition, contents generated by users, sensors, and other ubiquitous data sources
also drive the overwhelming data flows, which required a fundamental change on
the computing architecture and large-scale data processing mechanism. In January
2007, Jim Gray, a pioneer of database software, called such transformation “The
Fourth Paradigm” [ 23 ]. He also thought the only way to cope with such a paradigm
was to develop a new generation of computing tools to manage, visualize, and
analyze massive data. In June 2011, another milestone event occurred, when
EMC/IDC published a research report titled Extracting Values from Chaos [ 1 ],
which introduced the concept and potential of big data for the first time. This
research report aroused great interest in both industry and academia on big data.
Over the past few years, nearly all major companies, including EMC, Oracle,
IBM, Microsoft, Google, Amazon, and Facebook, etc., have started their big data
Search WWH ::




Custom Search