Database Reference
In-Depth Information
Where BI problems tend to require highly structured data organized in rows and
columns for accurate reporting, Data Science projects tend to use many types
of data sources, including large or unconventional datasets. Depending on an
organization's goals, it may choose to embark on a BI project if it is doing reporting,
creating dashboards, or performing simple visualizations, or it may choose Data
Science projects if it needs to do a more sophisticated analysis with disaggregated
or varied datasets.
1.2.2 Current Analytical Architecture
As described earlier, Data Science projects need workspaces that are purpose-built
for experimenting with data, with flexible and agile data architectures. Most
organizations still have data warehouses that provide excellent support for
traditional reporting and simple data analysis activities but unfortunately have
a more difficult time supporting more robust analyses. This section examines a
typical analytical data architecture that may exist within an organization.
Figure 1.9 shows a typical data architecture and several of the challenges it presents
to data scientists and others trying to do advanced analytics. This section examines
the data flow to the Data Scientist and how this individual fits into the process of
getting data to analyze on projects.
1. For data sources to be loaded into the data warehouse, data needs to be
well understood, structured, and normalized with the appropriate data
type definitions. Although this kind of centralization enables security,
backup, and failover of highly critical data, it also means that data
typically must go through significant preprocessing and checkpoints
before it can enter this sort of controlled environment, which does not
lend itself to data exploration and iterative analytics.
2. As a result of this level of control on the EDW, additional local systems
may emerge in the form of departmental warehouses and local data marts
that business users create to accommodate their need for flexible analysis.
These local data marts may not have the same constraints for security and
structure as the main EDW and allow users to do some level of more
in-depth analysis. However, these one-off systems reside in isolation,
often are not synchronized or integrated with other data stores, and may
not be backed up.
3. Once in the data warehouse, data is read by additional applications across
the enterprise for BI and reporting purposes. These are high-priority
Search WWH ::




Custom Search