Database Reference
In-Depth Information
|
stands for its finite cardinality. In a similar way, dom ( y )=
{c 1 ,...,c |dom ( y ) | }
dom ( a i )
|
represents the domain of the target attribute. Numeric
attributes have infinite cardinalities.
The instance space (the set of all possible examples) is defined as a
Cartesian product of all the input attributes domains: X = dom ( a 1 )
×
dom ( a 2 )
dom ( a n ). The universal instance space (or the labeled
instance space ) U is defined as a Cartesian product of all input attribute
domains and the target attribute domain, i.e.: U = X
×
...
×
dom ( y ).
The training set is a bag instance consisting of a set of m tuples.
Formally, the training set is denoted as S ( B )=(
×
x 1 ,y 1
,...,
x m ,y m
)
where x q
dom ( y ).
Usually, it is assumed that the training set tuples are generated
randomly and independently according to some fixed and unknown joint
probability distribution D over U . Note that this is a generalization of
the deterministic case when a supervisor classifies a tuple using a function
y = f ( x ).
This topic uses the common notation of bag algebra to present
projection ( π ) and selection ( σ ) of tuples. For example, given the dataset
S presented in Table 3.1, the expression π a 2 ,a 3 σ a 1 =“ Yes AND a 4 > 6 S corre-
sponds with the dataset presented in Table 3.2.
X and y q
Table 3.1 Illustration of a dataset
S with five attributes.
a 1
a 2
a 3
a 4
y
Yes
17
4
7
0
No
81
1
9
1
Yes
17
4
9
0
No
671
5
2
0
es
1
123
2
0
Yes
1
5
22
1
No
6
62
1
1
No
6
58
54
0
No
16
6
3
0
Table 3.2 The result of the expre-
ssion π a 2 ,a 3 σ a 1 =“ Yes AND a 4 > 6 S
based on Table 3.1.
a 2
a 3
17
4
17
4
1
5
Search WWH ::




Custom Search