A Formal Theory of Creativity to Model the Creation of Art - Computers and Creativity

Information Technology Reference

In-Depth Information

12.4 Previous Approximative Implementations of the Theory

Since 1990 I have built simple artificial scientists or artists with an intrinsic desire

to build a better model of the world and what can be done in it. They embody ap-

proximations of the theory of Sect. 12.3 . The agents are motivated to continually

improve their models, by creating or discovering more surprising, novel patterns ,

that is, data predictable or compressible in hitherto unknown ways. They actively

invent experiments (algorithmic protocols or programs or action sequences) to ex-

plore their environment, always trying to learn new behaviours (policies) exhibiting

previously unknown regularities or patterns. Crucial ingredients are:

1. An adaptive world model, essentially a predictor or compressor of the continu-

ally growing history of actions and sensory inputs, reflecting current knowledge

about the world,

2. A learning algorithm that continually improves the model (detecting novel, ini-

tially surprising spatio-temporal patterns, including works of art, that subse-

quently become known patterns),

3. Intrinsic rewards measuring the model's improvements due to its learning algo-

rithm (thus measuring the degree of subjective novelty & surprise),

4. A separate reward optimiser or reinforcement learner, which translates those re-

wards into action sequences or behaviours expected to optimise future reward.

These ingredients make the agents curious and creative: they get intrinsically moti-

vated to acquire skills leading to a better model of the possible interactions with the

world, discovering additional “eye-opening” novel patterns (including works of art)

predictable or compressible in previously unknown ways.

Ignoring issues of computation time, it is possible to devise mathematically op-

timal, universal RL methods (Hutter 2005 , Schmidhuber 2009d ) for such systems

(Schmidhuber 2006a ; 2010 ) (2006-). However, previous practical implementations

(Schmidhuber 1991a , Storck et al. 1995 , Schmidhuber 2002a ) were non-universal

and made approximative assumptions. Among the many ways of combining meth-

ods for (1-4) we implemented the following variants:

A. Non-traditional RL based on adaptive recurrent neural networks as predictive

world models is used to maximise intrinsic reward created in proportion to pre-

diction error (Schmidhuber 1991b ).

B. Traditional RL (Kaelbling et al. 1996 ) is used to maximise intrinsic reward cre-

ated in proportion to improvements of prediction error (Schmidhuber 1991a ).

C. Traditional RL maximises intrinsic reward created in proportion to relative en-

tropies between the agent's priors and posteriors (Storck et al. 1995 ).

D. Non-traditional RL (Schmidhuber et al. 1997 ) (without restrictive Markovian as-

sumptions) learns probabilistic, hierarchical programs and skills through zero-

sum intrinsic reward games of two players, each trying to out-predict or sur-

prise the other, taking into account the computational costs of learning, and

learning when to learn and what to learn (1997-2002) (Schmidhuber 1999 ;

2002a ).

Search WWH ::

Custom Search

Home