Java Reference

In-Depth Information

Attributes

SqFt

HouseID

#BRs

#Baths

Acres

House Value

...

1

2

3

4

3000

1500

2550

2300

5

3

4

4

3

2

4

3

1.25

.54

.88

4.6

748,000

279,000

510,900

1,420,500

Cases

X
1

X
2

X
m

Y

. . .

Target

Attribute

Predictor Attributes

Case Identifier

Figure 4-2

Characterization of data used for regression.

known numerical values, such as
house value
. In this house pricing

example, the predictors may include attributes for house square foot-

age, number of bedrooms, number of bathrooms, land area, and

proximity to school.

Also like classification, regression produces a functional rela-

tionship between the predictor attributes and the target attribute,

Y
=
f
(
X
2
, . . .,
X
m
). When getting a prediction from a regression

model, some models may return only the numerical prediction, for

example, a specific predicted house value such as $976,338. Others

may also be able to return a confidence band surrounding this

value. For example, the model may provide a confidence of

$15,478, which means the prediction for the house price is most

likely correct between the range $960,860 and $991,816.

Determining the quality of regression models is based on compar-

ing the size of the difference between the
actual
target value and the

predicted
value. Since predictions are continuous, it is highly unlikely

the model will predict a target value exactly, unlike classification

models that have few discrete values. As such, there are metrics that

assess the overall error the model makes when predicting a set of val-

ues. Chapter 7 explores specific metrics used to assess regression

model quality.

Algorithms that can support regression in JDM include support

vector machine, neural networks, and decision trees. Other popular

regression algorithms are
linear regression
and
generalized linear models

(GLM) [StatSci-GLM 2006].

Search WWH ::

Custom Search