A Formal Theory of Creativity to Model the Creation of Art - Computers and Creativity

Information Technology Reference

In-Depth Information

To encourage the agent to actively create data leading to easily learnable im-

provements of p (Schmidhuber 1991a ), the reward signal r(t) is split into two scalar

real-valued components: r(t) = g(r ext (t), r int (t)) , where g maps pairs of real values

to real values, e.g., g(a,b) = a + b .Here r ext (t) denotes traditional external reward

provided by the environment, such as negative reward for bumping into a wall, or

positive reward for reaching some teacher-given goal state. The Formal Theory of

Creativity, however, is mostly interested in r int (t) ,the intrinsic reward, which is pro-

vided whenever the model's quality improves—for purely creative agents r ext (t)

for all valid t . Formally, the intrinsic reward for the model's progress (due to some

application-dependent model improvement algorithm) between times t and t

1is

1 ) = f C p(t),h( ≤ t +

1 ) ,C p(t +

1 ) ,

r int (t +

1 ), h( ≤ t +

(12.2)

where f maps pairs of real values to real values. Various progress measures are pos-

sible; most obvious is f(a,b)

b . This corresponds to a discrete time version

of maximising the first derivative of the model's quality. Both the old and the new

model have to be tested on the same data, namely, the history so far . That is, progress

between times t and t

−

≤

1 is defined based on two models of h(

1 ) , where the

old one is trained only on h(

≤

t) and the new one also gets to see h(t

≤

1 ) .This

is like p(t) predicting data of time t +

1, then observing it, then learning something,

then becoming a measurably improved model p(t +

1 ) .

The above description of the agent's motivation separates the goal (finding or

making data that can be modelled better or faster than before) from the means of

achieving the goal. The controller's RL mechanism must figure out how to translate

such rewards into action sequences that allow the given world model improvement

algorithm to find and exploit previously unknown types of regularities. It must trade

off long-term vs short-term intrinsic rewards of this kind, taking into account all

costs of action sequences (Schmidhuber 1999 ; 2006a ).

The field of Reinforcement Learning (RL) offers many more or less powerful

methods for maximising expected reward as requested above (Kaelbling et al. 1996 ).

Some were used in our earlier implementations of curious, creative systems; see

Sect. 12.4 for a more detailed overview of previous simple artificial scientists and

artists (1990-2002). Universal RL methods (Hutter 2005 , Schmidhuber 2009d )as

well as RNN-based RL (Schmidhuber 1991b ) and SSA-based RL (Schmidhuber

2002a ) can in principle learn useful internal states memorising relevant previous

events; less powerful RL methods (Schmidhuber 1991a , Storck et al. 1995 ) cannot.

In theory C(p,h(

t)) should take the entire history of actions and perceptions

into account (Schmidhuber 2006a ), like the performance measure C xry :

≤

C xry p,h( ≤ t) =

pred p,x(τ) − x(τ)

+ pred p,r(τ) − r(τ)

pred p,y(τ) −

y(τ)

(12.3)

where pred(p, q) is p 's prediction of event q from earlier parts of the history.

C xry ignores the danger of overfitting (too many parameters for few data) through

a p that stores the entire history without compactly representing its regularities,

Computers and Creativity

Search WWH ::

Custom Search

Home