Information Technology Reference
InDepth Information
decompose matrices of size equal to the number of the coecients itself. For large
n
, the straightforward computation of such matrices might be impossible due
to the memory size restrictions. Another problem, which is perhaps the major
problem, is due to the fact that the number of observations in the training set
needs to be about 100 times more than the number of the coecients to be
estimated. When
n
is large (for
n
= 10 the required number of observations is
about 14,440,000) it is a significant practical limitation, especially when the data
generation is time consuming.
One of the main attractive features of the Support Vector Machines is their
ability to employ nonlinear mapping without essential increase in the number
of parameters to be estimated and, therefore, in computation time.
3.1 Data Generation: Training Datasets
A pseudorandom element
w
of
F
(
X
) can be generated as a pseudorandom se
quence
y
1
,...,y
l
of elements
y
i
∈
=
y
−
1
X
±
1
i
+1
,wherethelength
l
is also chosen pseudorandomly. However, it has been shown in [4] that ran
domly taken cyclically reduced words in
F
are already minimal with asymptotic
probability 1. Therefore, a set of randomly generated cyclic words in
F
would
be highly biased toward the class of minimal elements. To obtain fair training
datasets we use the following procedure.
For each positive integer
l
=1
,...,L
we generate pseudorandomly and
uniformly
K
cyclically reduced words from
F
(
X
)oflength
l
. Parameters
L
and
K
were chosen to be 1000 and 10 for pure practical reasons. Denote the resulting
set by
W
. Then using the deterministic Whitehead algorithm we construct the
corresponding set of minimal elements
W
min
=
such that
y
i
{
w
min

w
∈
W
}
.
W
min
with the word
t
(
v
), where
t
is a
randomly and uniformly chosen automorphism from
Ω
(
X
) such that
With probability 0.5 we substitute each
v
∈
 t
(
v
)

>

v

 t
(
v
)
(if
Ω
(
X
),andsoon).Now,theresultingset
L
is a set of pseudorandomly generated cyclically reduced words representing
the classes of minimal and nonminimal elements in approximately equal propor
tions. It follows from the construction that our choice of nonminimal elements
w
is not quite representative, since all these elements have Whitehead's complexity
one (which is not the case in general). One may try to replace the automorphism
t
above by a random finite sequence of automorphisms from
Ω
to get a more
representative training set. However, we will see in Section 4 that the training
dataset
L
is suciently good already, so we elected to keep it as it is.
From the construction we know for each element
v

=

v

we chose another
t
∈
∈
L
whether it is minimal
or not. Finally, we create a training set
D
=
{
<v,P
(
v
)
>

v
∈
L
}
,
where
P
(
v
)=
1
,v
is minimal;
0
,
otherwise
.
Search WWH ::
Custom Search