Database Reference
In-Depth Information
Figure 2.4 Data preparation phase
2.3.1 Preparing the Analytic Sandbox
The first subphase of data preparation requires the team to obtain an analytic
sandbox (also commonly referred to as a workspace ), in which the team can
explore the data without interfering with live production databases. Consider an
example in which the team needs to work with a company's financial data. The
team should access a copy of the financial data from the analytic sandbox rather
than interacting with the production version of the organization's main database,
because that will be tightly controlled and needed for financial reporting.
When developing the analytic sandbox, it is a best practice to collect all kinds of
data there, as team members need access to high volumes and varieties of data
for a Big Data analytics project. This can include everything from summary-level
aggregated data, structured data, raw data feeds, and unstructured text data from
call logs or web logs, depending on the kind of analysis the team plans to
undertake.
This expansive approach for attracting data of all kind differs considerably from
the approach advocated by many information technology (IT) organizations. Many
IT groups provide access to only a particular subsegment of the data for a specific
Search WWH ::




Custom Search