The Genoa Artificial Power-Exchange - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

depends on the market clearing at hour h . Assuming that the i th Genco belongs to zone

k , R i ( h ) is given by

Q i ( h )

TC i ( Q i ( h )) [

R i ( h )= LMP k ( h )

−

/h]

(4)

where TC i is i th Genco total-cost, LMP k ( h ) is the Location Marginal Price of zone

k at hour h and Q i ( h ) is the awarded quantity to the i th Genco at hour h . Finally, it

is worth remarking that the marginal cost is the reference parameter for the bids (see

equation 3), whereas the total costs are crucial in order to evaluate the real profitability

of the bids (see equation 4).

Enhanced Roth-Erev Reinforcement Learning Algorithm

Electricity markets are characterized by inherent complexity and repeated games that

requires adequate modeling of strategic behavior of traders. This is usually achieved by

endowing the Gencos with learning capability. The literature on agent-based electric-

ity market models points out three major kind of learning algorithms: zero-intelligence

algorithms [10],[11], reinforcement and belief-based models [5] and evolutionary ap-

proach [18]. In this paper, the strategic agent behavior is modeled by means of a rein-

forcement learning approach. It is worth remarking that the solutions proposed in the

literature generally account for positive and null payoffs (e.g., [18] represented a first

modification of the original work proposed by Roth and Erev [20] so to account for null

payoffs). Unfortunately, this is a severe limitation in order to determine profitable strat-

egy for economic agents in real a economic context. Indeed, the presence of fixed-costs

in the cost function (see equation 2) together with market awarded quantity Q i ( h )

for the i th Genco at hour h leads to payoffs that are either positive, negative or null. This

opens a question for a reinforcement learning approach that is able to cope with payoffs

of any sign and to this aim we have developed an enhanced version of the Roth and

Erev algorithm that is able to cope with both positive, negative and null payoffs. The

original Roth and Erev learning model (hereafter referred to as RE algorithm) considers

three psychological aspects of human learning:

≥

- the power law of practice, i.e., learning curves are initially steep and tend to pro-

gressively flatten out;

- the recency (or forgetting) effect, i.e., players recent experience plays a larger role

than past experience in determining his behavior;

- the experimentation effect, i.e., not only experimented strategy but also similar

strategies are reinforced.

For each strategy a j ∈A j ( i =1 , .., M ), at every round t , propensities S j,t− 1 ( a j ) are

updated according to:

S j,t ( a j )=(1

−

r )

S j,t− 1 ( a j )+ E j,t ( a j )

(5)

where r

[0 , 1] is the recency parameters which contributes to decrease exponentially

the effect of past results. The second term of equation 5 is called the experimentation

function and is given by:

∈

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home