An Analysis of the 2004 Supply Chain Management Trading Agent Competition - Agent-Mediated Electronic Commerce

Information Technology Reference

In-Depth Information

possibility for infinite games), let s ∗ be the set of strategy profiles with the lowest ben-

efit to deviation for any agent and define W ( s ∗ ( θ ) ,θ )=inf

s ∗ ( θ )

{

W ( s, θ ): s

∈{

}}

The designer's optimization problem is then

W ( s ∗ ( θ ) ,θ ) .

max

θ∈Θ

To analyze the effect of storage costs in the TAC/SCM game we consider the cor-

respondence between storage costs and equilibrium outcomes. The aggregate quantity

of day-0 procurement in these stable profiles yields an estimate of the behavior we

would expect to see in actual games. In the TAC/SCM domain, the designer's problem

is to minimize aggregate day-0 procurement. In the notation above, the designer maxi-

mizes W ( s ∗ ( θ ) ,θ ),definedby I

φ ( s ∗ ( θ ))

, where the aggregation function

φ ( s )= i =1 s i , s i is each agent's day-0 procurement choice, and α is a desired cap

on aggregate day-0 procurement.

The full strategy space in TAC/SCM is very complex, but for the purposes of this

analysis we define a restricted space that allows agent to select only day-0 purchase

quantity multiplier. We implemented this strategy space by parameterizing our tour-

nament agent, Deep Maize , with a multiplier on its day-0 requests. 4

{

sup

{

}≤

}

Players select a

multiplier from the set

. This strategy space defines the induced

game Γ θ . The payoffs for this game are not directly known, but we can obtain estimates

by simulating games on the TAC server. We must do this for each setting of storage costs

we wish to investigate, and collecting the samples is very time-consuming. Fortunately,

the questions we would like to answer are high-level and we can gather evidence about

them using approximate methods. Instead of requiring exact equilibrium solutions, we

aim to find regions of the profile space that are likely to be stable using the notion of -

Nash equilibrium, where agents cannot gain more than a small benefit by deviating to

a different strategy. We also use two different techniques to approximate sets of stable

profiles without sampling the full profile space.

The first method approximates payoff functions of the game using supervised learn-

ing. We tried three different learning techniques from those introduced in [8]: quadratic

regression (QR), locally weighted average (LWA), and locally weighted linear regres-

sion (LWLR). For quadratic regression, it is possible to directly compute equilibria of

the learned game analytically. For the other methods, we applied replicator dynam-

ics [9] to a discrete approximation of the learned game. The second method uses di-

rected search to find stable profiles. Given a partial game matrix we can compute a

bound on the epsilon for each profile that we have sample data for; this bound is the

maximum benefit for deviating to any profile in the data set. The current set of profiles

with the best -bounds is the set of candidate equilibria . We employed a “best-first”

search that always samples unexplored deviations from a candidate equilibria. The idea

is to confirm or refute the stability of promising individual profiles without requiring the

full game matrix to be sampled. A limitation of this approach is that it cannot rule out

the existence of additional equilibria in the set of profiles that have not been sampled.

We gathered data for storage costs in the set

{

0 , 0 . 3 , 0 . 6 ,..., 1 . 5

}

. An initial data

set was generated by sampling 10 randomly generated profiles for each storage cost

{

0 , 50 , 100 , 150 , 200

}

4 Deep Maize requested a total of 11800 components for each combination of supplier and

product, spread out over different due dates.

Agent-Mediated Electronic Commerce

Search WWH ::

Custom Search

Home