Introduction to Big Data Analytics - Data Science and Big Data Analytics

Database Reference

In-Depth Information

operational processes getting critical data feeds from the data warehouses

and repositories.

4. At the end of this workflow, analysts get data provisioned for their

downstream analytics. Because users generally are not allowed to run

custom or intensive analytics on production databases, analysts create

data extracts from the EDW to analyze data offline in R or other local

analytical tools. Many times these tools are limited to in-memory analytics

on desktops analyzing samples of data, rather than the entire population

of a dataset. Because these analyses are based on data extracts, they reside

in a separate location, and the results of the analysis—and any insights on

the quality of the data or anomalies—rarely are fed back into the main data

repository.

Figure 1.9 Typical analytic architecture

Because new data sources slowly accumulate in the EDW due to the rigorous

validation and data structuring process, data is slow to move into the EDW, and

the data schema is slow to change. Departmental data warehouses may have been

originally designed for a specific purpose and set of business needs, but over time

evolved to house more and more data, some of which may be forced into existing

schemas to enable BI and the creation of OLAP cubes for analysis and reporting.

Although the EDW achieves the objective of reporting and sometimes the creation

of dashboards, EDWs generally limit the ability of analysts to iterate on the data in

a separate nonproduction environment where they can conduct in-depth analytics

or perform analysis on unstructured data.

The typical data architectures just described are designed for storing and

processing mission-critical data, supporting enterprise applications, and enabling

corporate reporting activities. Although reports and dashboards are still important

Search WWH ::

Custom Search

Home