Information Technology Reference
In-Depth Information
variables in the classification of the cases. The contrast of assumptions is explained
below, using the Kolmogorov-Smirnov [33] test as an example. H0: the data follow a
uniform distribution; H1: the analyzed data do not follow a uniform distribution.
Statistical contrast:
{
}
D
=
max
D
+
,
D
(3)
i
i
1
D
+
=
max
F
(
x
)
D
=
max
F
(
x
)
where
with i as the pattern
0
i
0
i
n
n
1
i
n
1
i
n
F the probability of observing values less
than i with H 0 being true. The value of statistical contrast is compared to the next
value:
( 0
)
of entry, n the number of items and
C
D
=
α
(4)
α
k
( n
)
0
11
k
(
n
)
=
n
+
0
12
+
in the special case of uniform distribution
and a level of
n
α
=
0
05
C
=
1
358
significance
.
α
4.1.5 Cut-Off Points
This step removes the probes that, despite not following a uniform distribution, have
no separation between elements, and do not allow the elements to be partitioned. The
way to remove the probes is to detect changes in the densities of the data, and to se-
lect the final probes. The probes in which cut-offs or high densities are not detected
are eliminated, as they do not provide useful information to the classification process.
This will keep the probes that allow the separation of individuals. The detection of the
separation intervals is performed by calculating the distance between adjacent indi-
viduals. Once the distance is calculated, it is possible to determine the potentially
relevant values. The selection is carried out by applying confidence intervals for the
values of these differences if the values follow a uniform distribution, or by selecting
the values above a certain percentile if the values do not follow a normal distribution.
This process is formalized as follows:
Let I be the set of individuals with filtered probes together with the new in-
dividual, where
1.
x
represents the probe j for all the individuals, and
i x the
j
individual i for the probe j
j
=
1
x
2.
Select the probe
,
j
x
3.
Sort in increasing order values
j
x
'
=
x
x
4.
Calculate the value for
ij
i
+
1
j
ij
x ' follows a uniform distribution by means of the
Shapiro-Wilk test [34], otherwise go to step 10.
5.
Determine if the variable
ij
 
Search WWH ::




Custom Search