Information Technology Reference
In-Depth Information
depends on the market clearing at hour h . Assuming that the i th Genco belongs to zone
k , R i ( h ) is given by
Q i ( h )
TC i ( Q i ( h )) [
R i ( h )= LMP k ( h )
·
e
/h]
(4)
where TC i is i th Genco total-cost, LMP k ( h ) is the Location Marginal Price of zone
k at hour h and Q i ( h ) is the awarded quantity to the i th Genco at hour h . Finally, it
is worth remarking that the marginal cost is the reference parameter for the bids (see
equation 3), whereas the total costs are crucial in order to evaluate the real profitability
of the bids (see equation 4).
4
Enhanced Roth-Erev Reinforcement Learning Algorithm
Electricity markets are characterized by inherent complexity and repeated games that
requires adequate modeling of strategic behavior of traders. This is usually achieved by
endowing the Gencos with learning capability. The literature on agent-based electric-
ity market models points out three major kind of learning algorithms: zero-intelligence
algorithms [10],[11], reinforcement and belief-based models [5] and evolutionary ap-
proach [18]. In this paper, the strategic agent behavior is modeled by means of a rein-
forcement learning approach. It is worth remarking that the solutions proposed in the
literature generally account for positive and null payoffs (e.g., [18] represented a first
modification of the original work proposed by Roth and Erev [20] so to account for null
payoffs). Unfortunately, this is a severe limitation in order to determine profitable strat-
egy for economic agents in real a economic context. Indeed, the presence of fixed-costs
in the cost function (see equation 2) together with market awarded quantity Q i ( h )
0
for the i th Genco at hour h leads to payoffs that are either positive, negative or null. This
opens a question for a reinforcement learning approach that is able to cope with payoffs
of any sign and to this aim we have developed an enhanced version of the Roth and
Erev algorithm that is able to cope with both positive, negative and null payoffs. The
original Roth and Erev learning model (hereafter referred to as RE algorithm) considers
three psychological aspects of human learning:
- the power law of practice, i.e., learning curves are initially steep and tend to pro-
gressively flatten out;
- the recency (or forgetting) effect, i.e., players recent experience plays a larger role
than past experience in determining his behavior;
- the experimentation effect, i.e., not only experimented strategy but also similar
strategies are reinforced.
For each strategy a j ∈A j ( i =1 , .., M ), at every round t , propensities S j,t− 1 ( a j ) are
updated according to:
S j,t ( a j )=(1
r )
·
S j,t− 1 ( a j )+ E j,t ( a j )
(5)
where r
[0 , 1] is the recency parameters which contributes to decrease exponentially
the effect of past results. The second term of equation 5 is called the experimentation
function and is given by:
 
Search WWH ::




Custom Search