Database Reference
In-Depth Information
for organizations, most traditional data architectures inhibit data exploration and
more sophisticated analysis. Moreover, traditional data architectures have several
additional implications for data scientists.
• High-value data is hard to reach and leverage, and predictive analytics and
data mining activities are last in line for data. Because the EDWs are
designed for central data management and reporting, those wanting data
for analysis are generally prioritized after operational processes.
• Data moves in batches from EDW to local analytical tools. This workflow
means that data scientists are limited to performing in-memory analytics
(such as with R, SAS, SPSS, or Excel), which will restrict the size of the
datasets they can use. As such, analysis may be subject to constraints of
sampling, which can skew model accuracy.
• Data Science projects will remain isolated and ad hoc, rather than
centrally managed. The implication of this isolation is that the
organization can never harness the power of advanced analytics in a
scalable way, and Data Science projects will exist as nonstandard
initiatives, which are frequently not aligned with corporate business goals
or strategy.
All these symptoms of the traditional data architecture result in a slow
“time-to-insight” and lower business impact than could be achieved if the data
were more readily accessible and supported by an environment that promoted
advanced analytics. As stated earlier, one solution to this problem is to introduce
analytic sandboxes to enable data scientists to perform advanced analytics in a
controlled and sanctioned way. Meanwhile, the current Data Warehousing
solutions continue offering reporting and BI services to support management and
mission-critical operations.
1.2.3 Drivers of Big Data
To better understand the market drivers related to Big Data, it is helpful to first
understand some past history of data stores and the kinds of repositories and tools
to manage these data stores.
As shown in Figure 1.10 , in the 1990s the volume of information was often
measured in terabytes. Most organizations analyzed structured data in rows and
columns and used relational databases and data warehouses to manage large stores
of enterprise information. The following decade saw a proliferation of different
kinds of data sources—mainly productivity and publishing tools such as content
Search WWH ::




Custom Search