Biomedical Engineering Reference
In-Depth Information
based on statistical methods. One goal in using regression methods is to extrapolate trends from a
few samples of the data. In the example in Figure 7-2 , the extrapolation formula is a simple linear
function of the form:
y = mx + b
where x and y are coordinates on the plot, m is the slope of the line, and b is a constant. In practice,
more complex extrapolation formulas are used to describe data trends.
Link analysis evaluates apparent connections or links between data in the database or data
warehouse. Link analysis highlights correlations in data that can suggest linkage, but not causality. In
the illustration, the two pairs of data points are apparently linked, in that the value of one data
element in the pair can be predicted by the value of the other data point in the pair.
Deviation detection identifies data values that are outside of the norm, as defined by existing models
or by evaluating the ordering of observations. The outlier in the illustration is an example of a data
value outside of the expected spread of data in a sample. The data may represent a particular
sequence of amino acids or the molecular weight of a protein, or a vital sign, for example.
Segmentation-based data mining identifies classes or groups of data that behave similarly, according
to some metric. Segmentation is akin to link analysis applied to groups of data instead of individual
data points. In the figure, groups (A) and (C) behave similarly.
These methods of data mining are typically used in combination with each other, either in parallel or
as part of a sequential operation. For example, segmentation requires classes to be defined through a
classification process. Similarly, link analysis assumes that statistical analysis, including correlation
coefficients, are available. Likewise, deviation detection assumes that the data have been properly
classified and evaluated statistically to define the "normal" model. As described later in this chapter,
there are a variety of technologies available to support these methods.
Evaluation
In the evaluation phase of knowledge discovery, the patterns identified by the data-mining analysis
are interpreted. Typical evaluation ranges from simple statistical analysis and complex numerical
analysis of sequences and structures to determining the clinical relevance of the findings.
Visualization
Visualization of evaluation results is an optional stage in the knowledge-discovery process, but one
that typically adds considerable value to the overall system. Visualization can range from converting
tabular listings of data summaries to pie charts and similar business graphics, to using real-time data
to create 3D virtual reality displays that can be manipulated by haptic controllers.
Designing New Queries
Data mining is an iterative continual activity, in that there are always new hypotheses to test.
Sometimes the new hypotheses are suggested by the data returned by the mining process, and other
times the hypotheses originate from other research. In either case, testing the new hypotheses
requires formulating new queries and revisiting the selection and sampling stage of the data-mining
process.
Search WWH ::




Custom Search