Database Reference
In-Depth Information
Table 2.1 Sample Dataset Inventory
Dataset
Data
Available
and
Accessible
Data
Available, but
not Accessible
Data
to
Collect
Data to Obtain
from Third Party
Sources
Products
shipped
Product
Financials
Product Call
Center Data
Live Product
Feedback
Surveys
Product
Sentiment from
Social Media
2.3.4 Data Conditioning
Data conditioning refers to the process of cleaning data, normalizing datasets,
and performing transformations on the data. A critical step within the Data
Analytics Lifecycle, data conditioning can involve many complex steps to join
or merge datasets or otherwise get datasets into a state that enables analysis in
further phases. Data conditioning is often viewed as a preprocessing step for the
data analysis because it involves many operations on the dataset before developing
models to process or analyze the data. This implies that the data-conditioning step
is performed only by IT, the data owners, a DBA, or a data engineer. However, it
is also important to involve the data scientist in this step because many decisions
are made in the data conditioning phase that affect subsequent analysis. Part of
this phase involves deciding which aspects of particular datasets will be useful
to analyze in later steps. Because teams begin forming ideas in this phase about
which data to keep and which data to transform or discard, it is important to
involve multiple team members in these decisions. Leaving such decisions to a
single person may cause teams to return to this phase to retrieve data that may
have been discarded.
Search WWH ::




Custom Search