Database Reference
In-Depth Information
processing. If one were to consider all the data, the associated processes, and the metrics used in
any decision-making situation within any organization, we realize that we have used information
(volumes of data) in a variety of formats and varying degrees of complexity and derived decisions
with the data in nontraditional software processes.
We are seeing the evolution of Hadoop, MapReduce, and NoSQL with changes and new fea-
tures coming out of the woodwork every few months and sometimes weeks. These architectures
are being designed and built to handle large and complex data volumes and can process effectively
in a batch-oriented environment and have limited real-time or interactive capabilities that are
found in the RDBMS.
14.10.1 What Is Big Data?
Big Data can be defined as volumes of data available in varying degrees of complexity, generated at
different velocities and varying degrees of ambiguity, which cannot be processed using traditional
technologies, processing methods, algorithms, or any commercial off-the-shelf solutions.
Data defined as Big Data include weather, geospatial, and GIS data; consumer-driven data
from social media; enterprise-generated data from legal, sales, marketing, procurement, finance,
and human resources department; and device-generated data from sensor networks, nuclear
plants, x-ray and scanning devices, and airplane engines.
14.10.1.1 Data Volume
The most interesting data for any organization to tap into today are social media data. The amount
of data generated by consumers every minute provides extremely important insights into choices,
opinions, influences, connections, brand loyalty, brand management, and much more. Social
media sites provide not only consumer perspectives but also competitive positioning, trends, and
access to communities formed by common interest. Organizations today leverage the social media
pages to personalize marketing of products and services to each customer.
Every enterprise has massive amounts of e-mails that are generated by its employees, custom-
ers, and executives on a daily basis. These e-mails are all considered an asset of the corporation
and need to be managed as such. After Enron and the collapse of many audits in enterprises,
the US government mandated that all enterprises should have a clear life-cycle management of
e-mails and that e-mails should be available and auditable on a case-by-case basis. There are several
examples that come to mind like insider trading, intellectual property, and competitive analysis,
to justify governance and management of e-mails.
The list of features for handling data velocity included the following:
Nontraditional and unorthodox data processing techniques need to be innovated for pro-
cessing this data type.
Metadata are essential for processing these data successfully.
Metrics and KPIs are key to provide visualization.
Raw data do not need to be stored online for access.
Processed output needs to be integrated into an enterprise-level analytical ecosystem to pro-
vide better insights and visibility into the trends and outcomes of business exercises includ-
ing CRM, Optimization of Inventory, Clickstream analysis, and more.
The enterprise data warehouse (EDW) is needed for analytics and reporting.
Search WWH ::




Custom Search