Introduction - Big Data: Related Technologies, Challenges and Future Prospects

Database Reference

In-Depth Information

In 2008, Microsoft purchased Farecast, a sci-tech venture company in the

U.S. Forecast has an airline ticket forecasting system that predicts the trends and

rising/dropping ranges of airline ticket prices. The system has been incorporated

into the Bing search engine of Microsoft. By 2012, the system has saved nearly

USD 50 per ticket per passenger, with the forecast accuracy as high as 75 %.

At present, data has become an important production factor that could be

comparable to material assets and human capital. As multimedia, social media,

and IoT are fast evolving, enterprises will collect more information, leading to

an exponential growth of data volume. Big data will have a huge and increasing

potential in creating values for businesses and consumers.

1.4

The Development of Big Data

In late 1970s, the concept of “database machine” emerged, which is a technology

specially used for storing and analyzing data. With the increase of data volume, the

storage and processing capacity of a single mainframe computer system has become

inadequate. In the 1980s, people proposed “share nothing,” a parallel database

system, to meet the demand of the increasing data volume [ 19 ]. The share nothing

system architecture is based on the use of cluster and every machine has its own

processor, storage, and disk. Teradata system was the first successful commercial

parallel database system. Such database became very popular lately. On June 2,

1986, a milestone event occurred, when Teradata delivered the first parallel database

system with a storage capacity of 1TB to Kmart to help the large-scale retail

company in North America to expand its data warehouse [ 20 ]. In late 1990s, the

advantages of the parallel database was widely recognized in the database field.

However, many challenges on big data arose. With the development of Internet

services, indexes and queried contents were rapidly growing. Therefore, search

engine companies had to face the challenges of handling such big data. Google

created GFS [ 21 ] and MapReduce [ 22 ] programming models to cope with the

challenges brought about by data management and analysis at the Internet scale.

In addition, contents generated by users, sensors, and other ubiquitous data sources

also drive the overwhelming data flows, which required a fundamental change on

the computing architecture and large-scale data processing mechanism. In January

2007, Jim Gray, a pioneer of database software, called such transformation “The

Fourth Paradigm” [ 23 ]. He also thought the only way to cope with such a paradigm

was to develop a new generation of computing tools to manage, visualize, and

analyze massive data. In June 2011, another milestone event occurred, when

EMC/IDC published a research report titled Extracting Values from Chaos [ 1 ],

which introduced the concept and potential of big data for the first time. This

research report aroused great interest in both industry and academia on big data.

Over the past few years, nearly all major companies, including EMC, Oracle,

IBM, Microsoft, Google, Amazon, and Facebook, etc., have started their big data

Search WWH ::

Custom Search

Home