Database Reference
In-Depth Information
|
stands for its finite cardinality. In a similar way,
dom
(
y
)=
{c
1
,...,c
|dom
(
y
)
|
}
dom
(
a
i
)
|
represents the domain of the target attribute. Numeric
attributes have infinite cardinalities.
The instance space (the set of all possible examples) is defined as a
Cartesian product of all the input attributes domains:
X
=
dom
(
a
1
)
×
dom
(
a
2
)
dom
(
a
n
). The universal instance space (or the
labeled
instance space
)
U
is defined as a Cartesian product of all input attribute
domains and the target attribute domain, i.e.:
U
=
X
×
...
×
dom
(
y
).
The training set is a bag instance consisting of a set of
m
tuples.
Formally, the training set is denoted as
S
(
B
)=(
×
x
1
,y
1
,...,
x
m
,y
m
)
where
x
q
∈
dom
(
y
).
Usually, it is assumed that the training set tuples are generated
randomly and independently according to some fixed and unknown joint
probability distribution
D
over
U
. Note that this is a generalization of
the deterministic case when a supervisor classifies a tuple using a function
y
=
f
(
x
).
This topic uses the common notation of bag algebra to present
projection (
π
) and selection (
σ
) of tuples. For example, given the dataset
S
presented in Table 3.1, the expression
π
a
2
,a
3
σ
a
1
=“
Yes
”
AND a
4
>
6
S
corre-
sponds with the dataset presented in Table 3.2.
X
and
y
q
∈
Table 3.1 Illustration of a dataset
S
with five attributes.
a
1
a
2
a
3
a
4
y
Yes
17
4
7
0
No
81
1
9
1
Yes
17
4
9
0
No
671
5
2
0
es
1
123
2
0
Yes
1
5
22
1
No
6
62
1
1
No
6
58
54
0
No
16
6
3
0
Table 3.2 The result of the expre-
ssion
π
a
2
,a
3
σ
a
1
=“
Yes
”
AND
a
4
>
6
S
based on Table 3.1.
a
2
a
3
17
4
17
4
1
5
Search WWH ::
Custom Search