Database Reference
In-Depth Information
steps (note that some of the methods here are similar to Data Mining
algorithms, but these are used in the preprocessing context).
2. Creating a dataset on which discovery will be performed.
Having defined the goals, the data that will be used for the knowl-
edge discovery should be determined. This step includes finding out
what data is available, obtaining additional necessary data and then
integrating all the data for the knowledge discovery into one dataset,
including the attributes that will be considered for the process. This
process is very important because the Data Mining learns and discovers
new patterns from the available data. This is the evidence base for
constructing the models. If some important attributes are missing, then
the entire study may fail. For a successful process it is good to consider
as many as possible attributes at this stage. However, collecting,
organizing and operating complex data repositories is expensive.
3. Preprocessing and cleansing. At this stage, data reliability is
enhanced. It includes data clearing, such as handling missing values and
removing noise or outliers. It may involve complex statistical methods,
or using specific Data Mining algorithm in this context. For example,
if one suspects that a certain attribute is not reliable enough or has
too much missing data, then this attribute could become the goal of a
data mining supervised algorithm. A prediction model for this attribute
will be developed and then, the missing value can be replaced with
the predicted value. The extent to which one pays attention to this
level depends on many factors. Regardless, studying these aspects is
important and is often insightful about enterprise information systems.
4. Data transformation. At this stage, the generation of better data for
the data mining is prepared and developed. One of the methods that
can be used here is dimension reduction, such as feature selection and
extraction as well as record sampling. Another method that one could
use at this stage is attribute transformation, such as discretization of
numerical attributes and functional transformation. This step is often
crucial for the success of the entire project, but it is usually very
project-specific. For example, in medical examinations, it is not the
individual aspects/characteristics that make the difference rather, it
is the quotient of attributes that often is considered to be the most
important factor. In marketing, we may need to consider effects beyond
our control as well as efforts and temporal issues such as, studying the
effect of advertising accumulation. However, even if we do not use the
Search WWH ::




Custom Search