Biomedical Engineering Reference
In-Depth Information
9.2.3 Predictive Modeling
Predictive modeling is the practice of developing a mathematical model that can be
used to predict an outcome or a value based on a set of input parameters. Modeling
methodologies extend from very simple linear techniques up through highly sophis-
ticated machine learning algorithms. Regardless of the specific form, all models have
a few basic properties in common; they all take a series of inputs, termed attributes
in this context, and they all produce some form of a result, termed the target attrib-
ute. Models come in two distinct varieties, regression and classification. Regression
models are those that have a continuous numeric target attribute. Categorical mod-
els are those that have a discreet categorical target attribute.
9.2.3.1 Feature Selection
Before moving on to the specifics of different modeling techniques, it is first neces-
sary to discuss the problem of feature selection, that is, determining the set of data to
use as input for our model. Translational research presents a number of challenges
in predictive modeling, one of which is the fact that the number of attributes is often
far greater than the number of samples. As an example, let's look at the problem of
classifying tumor types based on gene expression profiles. Using modern DNA
microarray techniques, our attributes (genes) will be vastly greater (1,000-fold or
more) than our number of samples (tumors). Research has shown that reducing the
number of attributes to a small set of informative genes can greatly enhance the
accuracy of our models [12]. Feature selection algorithms help us to identify the
most informative features in a data set. Here we look at two particular feature selec-
tion algorithms that have been shown to be useful in many different domains,
information gain and Relief-F.
Information Gain
The information gain method examines each feature and measures the entropy
reduction in the target class distribution if that feature is used to partitions the data
set [11].
Relief-F
The Relief-F method draws instances at random, computes their nearest neighbors,
and adjusts a feature weighting vector to give more weight to features that discrimi-
nate the instance from neighbors of different classes [13].
9.2.3.2 Regression
The problem of regression is perhaps best illustrated as an example of curve fitting.
From an initial set of data points, we create a function that draws a line that comes
as close as possible to all of the points. Now, using our function we can estimate
where on the line any new (unknown) data points might fall. In linear regression, we
are limited to only being able to use functions that produce straight lines. Linear
regression models are simple to calculate and easy to evaluate, their utility is limited
however. Figure 9.1 below is an example of linear regression.
 
Search WWH ::




Custom Search