A Formal Theory of Creativity to Model the Creation of Art - Computers and Creativity

Information Technology Reference

In-Depth Information

if any. The principles of Minimum Description Length (MDL) and closely related

Minimum Message Length (MML) (Kolmogorov 1965 , Wallace and Boulton 1968 ,

Wallace and Freeman 1987 , Solomonoff 1978 , Rissanen 1978 , Li and Vitányi 1997 ),

however, take into account the description size of p ,viewing p as a compressor

program of the data h( ≤ t) . This program p should be able to deal with any pre-

fix of the growing history, computing an output starting with h(

≤

t) for any time

t (1

t<T ). (A program that halts after t steps can temporarily be fixed or aug-

mented by the trivial non-compressive method that simply stores any raw additional

data coming in after the halt—later learning may yield better compression and thus

intrinsic rewards.)

C l (p, h(

≤

t) : the number of

bits needed to specify the predictor and the deviations of the sensory history from

its predictions, in the sense of loss-free compression. The smaller C l , the more law-

fulness and regularity in the observations so far. While random noise is irregular

and arbitrary and incompressible, most videos are regular as most single frames

are very similar to the previous one. By encoding only the deviations, movie com-

pression algorithms can save lots of storage space. Complex-looking fractal images

(Mandelbrot 1982 ) are regular, as they usually are similar to their details, being

computable by very short programs that re-use the same code over and over again

for different image parts. The universe itself seems highly regular, as if computed

by a program (Zuse 1969 , Schmidhuber 1997a ; 2002c ; 2006b ; 2007a ): every photon

behaves the same way; gravity is the same on Jupiter and Mars, mountains usually

don't move overnight but remain where they are, etc.

Suppose p uses a small predictor that correctly predicts many x(τ) for 1

t)) denotes p 's compression performance on h(

t .

This can be used to encode x( ≤ t) compactly: Given the predictor, only the wrongly

predicted x(τ) plus information about the corresponding time steps τ are necessary

to reconstruct x(

≤

t) , e.g., (Schmidhuber 1992 ). Similarly, a predictor that learns

a probability distribution on the possible next events, given previous events, can

be used to compactly encode observations with high (respectively low) predicted

probability by few (respectively many) bits (Huffman 1952 , Schmidhuber and Heil

1996 ), thus achieving a compressed history representation.

Alternatively, p could make use of a 3D world model or simulation. The corre-

sponding MDL-based quality measure C 3 D (p, h(

≤

t)) is the number of bits needed

to specify all polygons and surface textures in the 3D simulation, plus the number

of bits needed to encode deviations of h( ≤ t) from the simulation's predictions. Im-

proving the model by adding or removing polygons may reduce the total number of

bits required (Schmidhuber 2010 ).

The ultimate limit for C l (p, h( ≤ t)) is K ∗ (h( ≤ t)) , a variant of the Kolmogorov

complexity of h(

≤

t) , namely, the length of the shortest program (for the given hard-

ware) that computes an output starting with h(

≤

t) (Solomonoff 1978 , Kolmogorov

1965 , Li and Vitányi 1997 , Schmidhuber 2002b ). We do not have to worry about

the fact that K ∗ (h(

≤

t)) in general cannot be computed exactly, only approximated

from above (for most practical predictors the approximation will be crude). This just

means that some patterns will be hard to detect by the limited predictor of choice,

that is, the reward maximiser will get discouraged from spending too much effort

on creating those patterns.

≤

Computers and Creativity

Search WWH ::

Custom Search

Home