Databases Reference
In-Depth Information
T A B L E 6 . 10
Counts using Method B.
Context
Symbol
Count
prob
a
09
l
08
o
02
Esc
03
Total_Count
22
T A B L E 6 . 11
Counts using Method C.
Context
Symbol
Count
prob
a
10
l
09
o
03
Esc
03
Total_Count
25
before. This increases the likelihood that the escape symbol will be used. Therefore, we
should assign a higher probability to the escape symbol.
A variant of Method B, appropriately named Method C, was proposed by Moffat [ 75 ].
In Method C, the count assigned to the escape symbol is the number of symbols that have
occurred in that context. In this respect, Method C is similar to Method B. The difference
comes in the fact that, instead of “robbing” this from the counts of individual symbols, the
total count is inflated by this amount. This situation is shown in Table 6.11 .
While there is some variation in performance depending on the characteristics of the data
being encoded, Method C, on average, seems to provide the best performance of the three
methods for assigning counts to the escape symbol.
6.3.3 Length of Context
It would seem that as far as the maximum length of the contexts is concerned, more is better.
However, this is not necessarily true. A longer maximum length will usually result in a higher
probability if the symbol to be encoded has a non zero count with respect to that context. How-
ever, a long maximum length also means a higher probability of long sequences of escapes,
which in turn can increase the number of bits used to encode the sequence. If we plot the
compression performance versus maximum context length, we see an initial sharp increase
in performance until some value of the maximum length, followed by a steady drop as the
maximum length is further increased. The value at which we see a downturn in performance
changes depending on the characteristics of the source sequence.
An alternative to the policy of a fixed maximum length is used in the algorithm ppm [ 76 ].
This algorithm uses the fact that long contexts that give only a single prediction are seldom
followed by a new symbol. If mike has always been followed by y in the past, it will probably
not be followed by
/
b the next time it is encountered. Contexts that are always followed by the
 
Search WWH ::




Custom Search