Database Reference
In-Depth Information
a more efficient algorithm for the case of sparse features (most components of the feature
[1] A. Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling do-
main,”
Machine Learning
26 (1997), pp. 5-23.
[2] L. Bottou, “Large-scale machine learning with stochastic gradient descent,”
Proc. 19th Intl. Conf. on Computa-
tional Statistics
(2010), pp. 177-187, Springer.
[3] L. Bottou, “Stochastic gradient tricks, neural networks,” in
Tricks of the Trade, Reloaded
, pp. 430-445, edited by
G. Montavon, G.B. Orr and K.-R. Mueller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
[4] C.J.C. Burges, “A tutorial on support vector machines for pattern recognition,”
Data Mining and Knowledge Dis-
covery
2 (1998), pp. 121-167.
[5] N. Cristianini and J. Shawe-Taylor,
An Introduction to Support Vector Machines and Other Kernel-Based Learn-
ing Methods
, Cambridge University Press, 2000.
[6] C. Cortes and V.N. Vapnik, “Support-vector networks,”
Machine Learning
20 (1995), pp. 273-297.
[7] Y. Freund and R.E. Schapire, “Large margin classification using the perceptron algorithm,”
Machine Learning
3
(1999), pp. 277-296.
[8] T. Joachims, “Training linear SVMs in linear time.”
Proc. 12th ACM SIGKDD
(2006), pp. 217-226.
[9] N. Littlestone, “Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm,”
Machine
Learning
2 (1988), pp. 285-318.
[10] M. Minsky and S. Papert,
Perceptrons: An Introduction to Computational Geometry
(2nd edition), MIT Press,
Cambridge MA, 1972.
[11] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,”
Psy-
chological Review
65:6 (1958), pp. 386-408.
1
Constant
b
in this formulation of a hyperplane is the same as the negative of the threshold
θ
in our treatment of per-
ceptrons in
Section 12.2
.
2
Note, however, that
d
there has become
d
+ 1 here, since we include
b
as one of the components of
w
when taking the
derivative.
3
While the region belonging to any one point is convex, the union of the regions for two or more points might not be
convex. Thus, in
Fig. 12.21
we see that the region for all Dachshunds and the region for all Beagles are not convex.
That is, there are points
p
1
and
p
2
that are both classified as Dachshunds, but the midpoint of the line between
p
1
and
p
2
is classified as a Beagle, and vice versa.