Java Reference
In-Depth Information
Attributes
SqFt
HouseID
#BRs
#Baths
Acres
House Value
...
1
2
3
4
3000
1500
2550
2300
5
3
4
4
3
2
4
3
1.25
.54
.88
4.6
748,000
279,000
510,900
1,420,500
Cases
X 1
X 2
X m
Y
. . .
Target
Attribute
Predictor Attributes
Case Identifier
Figure 4-2
Characterization of data used for regression.
known numerical values, such as house value . In this house pricing
example, the predictors may include attributes for house square foot-
age, number of bedrooms, number of bathrooms, land area, and
proximity to school.
Also like classification, regression produces a functional rela-
tionship between the predictor attributes and the target attribute,
Y = f ( X 2 , . . ., X m ). When getting a prediction from a regression
model, some models may return only the numerical prediction, for
example, a specific predicted house value such as $976,338. Others
may also be able to return a confidence band surrounding this
value. For example, the model may provide a confidence of
$15,478, which means the prediction for the house price is most
likely correct between the range $960,860 and $991,816.
Determining the quality of regression models is based on compar-
ing the size of the difference between the actual target value and the
predicted value. Since predictions are continuous, it is highly unlikely
the model will predict a target value exactly, unlike classification
models that have few discrete values. As such, there are metrics that
assess the overall error the model makes when predicting a set of val-
ues. Chapter 7 explores specific metrics used to assess regression
model quality.
Algorithms that can support regression in JDM include support
vector machine, neural networks, and decision trees. Other popular
regression algorithms are linear regression and generalized linear models
(GLM) [StatSci-GLM 2006].
 
Search WWH ::




Custom Search