Databases Reference
In-Depth Information
Additional Considerations for Big Data
Warehouse (BDW)
The enterprise data platform must absolutely stay relevant to the business. As the value
and the visibility of big data analytics grow, the enterprise data platform must encompass
the new culture, skills, techniques, and systems required for big data analytics.
Sandboxes
BDW provides interesting capabilities to do exploratory analysis and experimentation.
These capabilities usually consist of mashed up data sets, sophisticated algorithms,
and codebase and rich data visualization components. We call these capabilities
“sandboxes.” Data analysts analyze the mashed up data sets with a wide variety of tools
(mostly open-source tools to keep the cost low): data integration tools like the Hadoop
ecosystem, sophisticated statistical analysis tools like SAS, Matlab or R, and many forms
of ad hoc querying and rich data visualization tools like Qlikview, Tableau . Since BDW
is an exploratory ground and aids in the discovery process, the data analyst responsible
for a given sandbox has a complete freedom to do anything with the data (many times
the data sources are well beyond the corporate firewalls) using any tool (often times the
data analysts creates custom tools) to maximize productivity and enhance the discovery
process. The sandbox capability has enormous potential but at the same time it also
carries a significant risk of proliferation of isolated and incompatible stovepipes of data.
Exploratory sandboxes usually have lifetime association with a specific
discovery process and objective. For example, the data analyst may be developing
predictive models for a specific business hypothesis. Typically, if such an experiment
produces a successful result, the sandbox experiment has met its goal, and the entire
experimentation process along with data sets and algorithms are carefully evaluated to
become a standard production feature. The data analyst then moves on to solve another
problem.
Low latency
Many big data use cases are associated with real-time data processing, analysis, and
in-sight generation. Low latency data processing and analysis needs are arising from the
fact that data has a time dimension associated with it: if you do not process and analyze
at that very moment, the value of data erodes significantly. An ideal implementation of
low latency data processing and analysis would allow streaming data analysis to take
place while the data is being acquired and processed. The availability of extremely
frequent and extremely detailed event measurements can drive interactive intervention.
The use cases where this intervention is important spans many situations ranging from
online gaming to product offer suggestions to financial account fraud responses to the
stability of networks.
 
Search WWH ::




Custom Search