Geoscience Reference
In-Depth Information
learning: From the training sample, the learner estimates the subset of the feature space on which
the conditional density p( x
| y = 1 ) is larger than some threshold
X
={
X | p( x
| y = 1 ) };
x
(7.1)
1
then during testing, an unlabeled instance x is classified as y =
X
1 , and y =
1 if x
1 otherwise.
X 1 is fixed after training. If the as-
sumption were true, then no matter what the test data looks like, the classification for any particular
x would be fixed. Zaki and Nosofsky showed that this is in fact not true. Their experiment com-
pares two conditions that differ only in their test sample distribution p( x ) ; the results demonstrate
differences in classification under the two conditions.
In their experiment, each stimulus is a 9-dot pattern as shown in Figure 7.1(a). The location
of the nine dots can vary independently, creating the feature space. The training sample consists
of 40 instances drawn from a distribution centered around mean μ (which is a particular 9-dot
pattern), and with some high variance (large spread). The training density is schematically shown
in Figure 7.1(b), and is shared by the two conditions below.
In condition 1, the test sample consists of the following mixture: 4 from the mean itself
The approach above assumes that the estimated level-set
x
= μ , 20 from a low-variance (small spread) distribution around μ , 20 from the same high-variance
distribution around μ , and 40 random instances. This is shown in Figure 7.1(c). Overall, there is
a mode (peak of density) at the mean μ . Condition 1 is first used by Knowlton and Squire [ 100 ].
Because human experiments are typically noisy, the interesting quantity is the fraction of y
=
1
classifications for the various groups of instances. In particular, let
p(y = 1 | μ) be the observed
fraction of trials where the subjects classified y
=
1 among all trials where x
=
μ .Let
p(y
ˆ
=
1
|
low ) ,
p(y = 1 |
random ) be the similar fraction when x is drawn from the low-variance,
high-variance, and random distribution, respectively. Perhaps not surprisingly, when averaging over
a large number of subjects one observes that
high ) ,
p(y = 1 |
p(y =
1
| μ) >
p(y =
1
|
low )> p(y =
1
|
high )
p(y = 1 |
random ).
(7.2)
In condition 2, the mode is intentionally shifted to μ new , which is itself sampled from the
high-variance distribution. Specifically, there are 4 instances with x
μ new , and 20 instances from
a low-variance distribution around μ new . Only 1 test instance (as opposed to 4) remains at the
old mean x
=
μ , and 2 instances (as opposed to 20) from the low-variance distribution around μ .
There are 19 instances (similar to the previous 20) from the high-variance distribution around μ .
The 40 random instances remain the same. This is depicted in Figure 7.1(d). Under this test sample
distribution, human behaviors are drastically different:
=
| μ new )> p(y =
low new )
p(y =
1
1
|
>
p(y = 1 | μ) p(y = 1 |
low ) p(y = 1 |
high )
p(y =
1
|
random ),
(7.3)
Search WWH ::




Custom Search