Information Technology Reference
In-Depth Information
Some Refinement by Jantke and Grieser. The Lange/Wiehagen algorithm
has been discussed from the viewpoint of knowledge eciency in [14]. Within
the process of learning, there are two out of three cases in which the algorithm is
loosing information. First, when some string longer than the current hypothesis
does occur-case [L/W i] in the description of the algorithm above-it is ignored.
Second, when some string shorter than the current hypothesis does occur-case
[L/W ii] indicated above-the hypothesis kept so far is given up and the shorter
string is adopted to form the new hypothesis. Certain refinements seem possible.
In the first case [L/W i], checking whether or not the longer string carries
useful information is costly, because it means checking membership. Does the
current pattern allow for generating this string or not? This decision problem is
known to be NP-hard and, thus, the task of refinement is not taken into account.
In the second case [L/W ii], all information collected so far in the latest
hypothesis is thrown away when the newly appearing shorter string takes the
place of the hypothesis. Isn't there any way to preserve at least some of the
information in the pattern? A refinement proposed by Jantke and Grieser [14]
is considered for the purpose of the present investigation.
Assume that the Lange/Wiehagen algorithm as described above is currently
keeping some hypothetical pattern p of length
|
p
|
= k . Furthermore, assume
that some new string s of length
= l with l<k is presented to the learning
algorithm. Instead of throwing p away and adopting s as it is, the refinement is
to calculate the new hypothesis as some least general generalization of p and s .
p 1
|
s
|
=
s 1
p n
if
|
p n |
<
|
s n +1 |
[J/G i]
p n +1 =
[J/G ii]
At a first glance, this description seems to be more rough-instead of being more
fine-than the algorithm
lgg ( p n ,s n +1 ) if
|
p n |≥|
s n +1 |
A above, because it has only two many cases, whereas
the other one has three. However, case [L/W ii] is really refined.
Conceptually, one may think of this refinement as follows. First, the pattern
x 1 ...x l is taken as a possible generalization, because it definitely generalizes
both p n and s n +1 . Next, traversing the pattern x 1 ...x l symbol by symbol from
the left to the right it is specialized.
The algorithm is not of high computational complexity, but a bit cumbersome
due to a larger number of distinct cases and their dovetailing.
To warm up, some intuitive example is presented in which the references on
the right refer to the algorithm to be presented below. The following figure is
illustrating the annotation process for the pattern p = ax 1 x 2 x 3 bx 4 ccax 1 x 2 x 3 x 4 c
of length k = 14 and the string abacac with l = 6. The goal is to generalize the
pattern p by some other pattern of length l = 6. The algorithm runs in turns.
In every turn, certain identical substrings are found and indexed.
As a result of the procedure illustrated by means of figure 5, the pattern is
partitioned into l = 6 substrings. The partitioning represented by the last line of
the figure above is interpreted as the intermediate pattern x 1 x 2 x 3 x 4 x 1 x 5 which
generalizes p .
 
Search WWH ::




Custom Search