Database Reference
In-Depth Information
Chapter 1
Introduction
We live in the era of big data. Information from multiple sources is growing at
a staggering rate. The number of Internet users reached 2.27 billion in 2012.
Google estimates that the total number of web pages exceeds one trillion. Every
day, Facebook generates more than 25 TB of log data, Twitter generates more
than 12 TB of tweets, and the New York Stock Exchange captures 1 TB of trade
information. Each minute, 15 h of video are uploaded to YouTube. About 30 billion
radio-frequency identification (RFID) tags are created every day. Add to this mix
the data generated by the hundreds of millions of GPS devices sold every year, and
the more than 30 million networked sensors currently in use (and growing at a rate
faster than 30 percent per year). Modern high-energy physics experiments, such as
DZero [ 46 ], typically generate more than one TeraByte of data per day. These data
volumes are expected to double every two years over the next decade.
The rapidly expanding generation of Internet-based services such as email,
blogging, social networking, search, and e-commerce have substantially redefined
the behavior and trends of web users when it comes to creating, communicating,
accessing content, sharing information, and purchasing products. For example,
we buy topics on Amazon , sell thing on eBay , stay in contact with friends and
colleagues via Facebook and Linkedin , start a blog using Wo rd P re s s . share pictures
via Flickr , and share videos via YouTube . These are just examples to name a few
well-known internet-based services that we use in our everyday life. IT professionals
are witnessing a proliferation in the scale of the data generated and consumed
because of the growth in the number of these systems.
A company can generate up to petabytes of information in the course of a
year: web pages, blogs, clickstreams, search indices, social media forums, instant
messages, text messages, email, documents, consumer demographics, sensor data
from active and passive systems, and more. By many estimates, as much as 80%
of this data is semi-structured or unstructured. Companies are always seeking
to become more nimble in their operations and more innovative with their data
analysis and decision-making processes. And they are realizing that time lost in
these processes can lead to missed business opportunities. The core of the big
Search WWH ::




Custom Search