Information Technology Reference
In-Depth Information
if any. The principles of Minimum Description Length (MDL) and closely related
Minimum Message Length (MML) (Kolmogorov 1965 , Wallace and Boulton 1968 ,
Wallace and Freeman 1987 , Solomonoff 1978 , Rissanen 1978 , Li and Vitányi 1997 ),
however, take into account the description size of p ,viewing p as a compressor
program of the data h( t) . This program p should be able to deal with any pre-
fix of the growing history, computing an output starting with h(
t) for any time
t (1
t<T ). (A program that halts after t steps can temporarily be fixed or aug-
mented by the trivial non-compressive method that simply stores any raw additional
data coming in after the halt—later learning may yield better compression and thus
intrinsic rewards.)
C l (p, h(
t) : the number of
bits needed to specify the predictor and the deviations of the sensory history from
its predictions, in the sense of loss-free compression. The smaller C l , the more law-
fulness and regularity in the observations so far. While random noise is irregular
and arbitrary and incompressible, most videos are regular as most single frames
are very similar to the previous one. By encoding only the deviations, movie com-
pression algorithms can save lots of storage space. Complex-looking fractal images
(Mandelbrot 1982 ) are regular, as they usually are similar to their details, being
computable by very short programs that re-use the same code over and over again
for different image parts. The universe itself seems highly regular, as if computed
by a program (Zuse 1969 , Schmidhuber 1997a ; 2002c ; 2006b ; 2007a ): every photon
behaves the same way; gravity is the same on Jupiter and Mars, mountains usually
don't move overnight but remain where they are, etc.
Suppose p uses a small predictor that correctly predicts many x(τ) for 1
t)) denotes p 's compression performance on h(
t .
This can be used to encode x( t) compactly: Given the predictor, only the wrongly
predicted x(τ) plus information about the corresponding time steps τ are necessary
to reconstruct x(
τ
t) , e.g., (Schmidhuber 1992 ). Similarly, a predictor that learns
a probability distribution on the possible next events, given previous events, can
be used to compactly encode observations with high (respectively low) predicted
probability by few (respectively many) bits (Huffman 1952 , Schmidhuber and Heil
1996 ), thus achieving a compressed history representation.
Alternatively, p could make use of a 3D world model or simulation. The corre-
sponding MDL-based quality measure C 3 D (p, h(
t)) is the number of bits needed
to specify all polygons and surface textures in the 3D simulation, plus the number
of bits needed to encode deviations of h( t) from the simulation's predictions. Im-
proving the model by adding or removing polygons may reduce the total number of
bits required (Schmidhuber 2010 ).
The ultimate limit for C l (p, h( t)) is K (h( t)) , a variant of the Kolmogorov
complexity of h(
t) , namely, the length of the shortest program (for the given hard-
ware) that computes an output starting with h(
t) (Solomonoff 1978 , Kolmogorov
1965 , Li and Vitányi 1997 , Schmidhuber 2002b ). We do not have to worry about
the fact that K (h(
t)) in general cannot be computed exactly, only approximated
from above (for most practical predictors the approximation will be crude). This just
means that some patterns will be hard to detect by the limited predictor of choice,
that is, the reward maximiser will get discouraged from spending too much effort
on creating those patterns.
Search WWH ::




Custom Search