Databases Reference
In-Depth Information
with the variety of data types even if the volume is in terabytes. These interpretations have
made big data issues situational.
The pervasiveness of the Internet has pushed generation and usage of data to
unprecedented levels. This aspect of digitization has taken a new meaning. The term
“data” is now expanding to cover events captured and stored in the form of text, numbers,
graphics, video, images, sound, and signals.
Table 1-1 illustrates the measures of scale of data.
Table 1-1. Measuring Big Data
1000 Gigabytes (GB) = 1 Terabyte (TB)
1000 Terabytes = 1 Petabyte (PB)
1000 Petabytes = 1 Exabyte (EB)
1000 Exabytes = 1 Zettabyte (ZB)
1000 Zettabytes = 1 Yottabyte (YB)
Is big data a new problem for enterprises? Not necessarily.
Big data has been of concern in few selected industries and scenarios for some time:
physical sciences (meteorology, physics), life sciences (genomics, biomedical research),
financial institutions (banking, insurance, and capital markets) and government (defense,
treasury). For these industries, big data was primarily a data volume problem, and to solve
these data-volume-related issues they had heavily relied on a mash-up of custom-developed
technologies and a set of complex programs to collect and manage the data. But, when doing
so, these industries and vendor products generally made the total cost of ownership (TCO) of
the IT infrastructure rise exponentially every year.
CIOs and CTOs have always grappled with dilemmas like how to lower IT costs to
manage the ever-increasing volumes of data, how to build systems that are scalable,
how to address performance-related concerns to meet business requirements that are
becoming increasingly global in scope and reach, how to manage data security, and
privacy and data-quality-related concerns. The polystructured nature of big data has
made the concerns increase in manifold ways: how does an industry effectively utilize
the poly-structured nature of data (structured data like database content, semi-structured
data like log files or XML files and unstructured content like text documents or web pages
or graphics) in a cost effective manner?
We have come a long way from the first mainframe era. Over the last few years,
technologies have evolved, and now we have solutions that can address some or all
of these concerns. Indeed a second mainframe wave is upon us to capture, analyze,
classify, and utilize the massive amount of data that can now be collected. There are
many instances where organizations, embracing new methodologies and technologies,
effectively leverage these poly-structured data reservoirs to innovate. Some of these
innovations are described below:
Search at scale
Multimedia content
Sentiment analysis
 
 
Search WWH ::




Custom Search