Information Technology Reference
In-Depth Information
1
bidding strategy b j used. Additionally, the mean μ =
N Σ i euro i of the derived
1
N Σ i ( μ
euro i ) 2 at the end of each sequence of auctions
are logged per agent. The history , along with the measured performance of a ,
is used to update the parameters of the bidding strategy to improve expected
profits per agent.
Per agent, we adjust the local likelihood for strategies for the participated
auctions if, and only if, the derived profits euros for a specific history if euros is
not in the range =[ μ
profits and variance σ =
σ, μ + σ ]. This indicates that decisions were made that
should be promoted or decreased in likelihood due to the exceptional positive or
negative performance. For a derived profits of euros after a history of bidding,
an euros outside of the range ,then Δ equals the excess in performance outside
of range .This Δ was caused by the actual strategies used for each state in the
entering of auctions. We are however faced with a credit assignment problem,
i.e. which of the choices are actually responsible for the change in performance?
We use a Monte Carlo-like approach and distribute the credit ( Δ ) evenly over
all strategy choices at the end of one epoch. Each strategy choice in h i
history
Δ
is assigned
of the credit. Let s be the state from which the bid in h i
was made for load l and s the corresponding successor state. Then the likeli-
hood for playing strategy b k with probability p k for this transition is updated to
p k +1 = p k + α
|
history
|
Δ . To retain unity, the other two strategies are updated to p k +1 =
p k
Δ .Thevariable α =0 . 1 is the learning rate which, unless stated
otherwise, is set low to cope with a highly dynamic environment and to ensure
smooth changes in the behavior of an agent in order to not forget good strategies.
0 . 5
α
4
Experiments
In this section, we illustrate the phenomena encountered when conducting ex-
periments with competitive agents. We note that the presented results are not
specific to the chosen settings, but are typical for levels of competition between
the agents for available loads and their valuations with complementary values
for bundles.
We first consider 10 agents, each with a capacity of 5. There are three fruitful
regions with 5, 20, and 10 loads for auction in each epoch respectively. There are
hence 35 loads for auction for a total capacity of the agents of 50. In Figure 2a
we show the average utility/profits (and variance) of the agents 4 for the above
scenario. The first five agents are strategic bidders and the remaining agents (six
to ten), use straightforward, myopic bidding as defined in Section 3.
The average profits for all 10 myopic bidders for the above setting is
1 . 2
(not shown). The 5 strategic bidders in Figure 2a are evidently able to increase
their profits at the cost of the myopic bidders. This is also apparent from a study
of the used capacity of the agents. For 10 myopic agents, each agent uses about
70% of capacity, i.e. an average of 0 . 7 5 loads is won in the auctions. This is
reduced to only 35% use of capacity for the scenario of Figure 2a as the strate-
gic bidders fill their trucks to near capacity at the cost of the myopic bidders.
4 Results are averaged over a 100 runs that ran to a 100 , 000 epochs.
Search WWH ::




Custom Search