Database Reference
In-Depth Information
2.2
MDL
Minimum Description Length (MDL) [ 20 , 52 ], like its close cousin Minimum
Message Length (MML) [ 69 ], is in this sense a practical version of Kolmogorov
Complexity [ 40 ]. All three embrace the slogan Induction by Compression , but the
details on how to compress vary. For MDL, this principle can be roughly described
as follows.
Given a set of models 2
M
, the best model M
M
is the one that minimizes
L (
D
, M )
= L ( M )
+ L (
D | M )
in which
￿
L ( M ) is the length, in bits, of the description of M , and
￿
L (
D |
M ) is the length, in bits, of the description of the data when encoded with
M .
This is called two-part MDL, or crude MDL—as opposed to refined MDL, where
model and data are encoded together [ 20 ]. We consider two-part MDL because we
are specifically interested in the compressor: the set of patterns that yields the best
compression. Further, although refined MDL has stronger theoretical foundations, it
cannot be computed except for some special cases.
2.2.1
MDL and Kolmogorov
The MDL-optimal model M has many of the properties of the Kolmogorov optimal
program p . In fact, two-part MDL and Kolmogorov complexity have a one-to-
one connection [ 1 , 20 ]. Loosely speaking, the two terms respectively express the
structure in the data, and the deviation from that structure: L ( M ) corresponds to
the 'algorithm' part of p , which generates the structure. L (
M ), on the other
hand, does not contain any structure—as otherwise there would be a better M —and
can be seen as the 'parameter' part of p . One important difference is that L (
D |
, M )
happily ignores the length of the decompression algorithm—which would be needed
to reconstruct the data given the compressed representation of the model and data.
The reason is simple: its length is constant, and hence does not influence the selection
of the best model.
D
2.2.2
MDL and Probabilities
Any MDL-based approach encodes both the data and the models, for which codes
are required. It is well-known that there is a close relation between probability dis-
tributions and optimal codes. That is, Shannon's source coding theorem states that
2
MDL-theorists tend to talk about hypotheses in this context
Search WWH ::




Custom Search