Database Reference
In-Depth Information
will have a much richer set of observations to choose from and more choices for
agreeing upon the most impactful conclusions from a project.
Another part of this process involves gathering and assessing hypotheses from
stakeholders and domain experts who may have their own perspective on what
the problem is, what the solution should be, and how to arrive at a solution.
These stakeholders would know the domain area well and can offer suggestions
on ideas to test as the team formulates hypotheses during this phase. The team
will likely collect many ideas that may illuminate the operating assumptions of
the stakeholders. These ideas will also give the team opportunities to expand the
project scope into adjacent spaces where it makes sense or design experiments in
a meaningful way to address the most important interests of the stakeholders. As
part of this exercise, it can be useful to obtain and explore some initial data to
inform discussions with stakeholders during the hypothesis-forming stage.
2.2.7 Identifying Potential Data Sources
As part of the discovery phase, identify the kinds of data the team will need to
solve the problem. Consider the volume, type, and time span of the data needed to
test the hypotheses. Ensure that the team can access more than simply aggregated
data. In most cases, the team will need the raw data to avoid introducing bias for
the downstream analysis. Recalling the characteristics of Big Data from Chapter
1, assess the main characteristics of the data, with regard to its volume, variety,
and velocity of change. A thorough diagnosis of the data situation will influence the
kinds of tools and techniques to use in Phases 2-4 of the Data Analytics Lifecycle.
In addition, performing data exploration in this phase will help the team determine
the amount of data needed, such as the amount of historical data to pull from
existing systems and the data structure. Develop an idea of the scope of the data
needed, and validate that idea with the domain experts on the project.
The team should perform five main activities during this step of the discovery
phase:
Identify data sources: Make a list of candidate data sources the team
may need to test the initial hypotheses outlined in this phase. Make an
inventory of the datasets currently available and those that can be
purchased or otherwise acquired for the tests the team wants to perform.
Capture aggregate data sources: This is for previewing the data and
providing high-level understanding. It enables the team to gain a quick
Search WWH ::




Custom Search