Information Technology Reference
In-Depth Information
possibility for infinite games), let s be the set of strategy profiles with the lowest ben-
efit to deviation for any agent and define W ( s ( θ ) )=inf
s ( θ )
{
W ( s, θ ): s
∈{
}}
.
The designer's optimization problem is then
W ( s ( θ ) ) .
max
θ∈Θ
To analyze the effect of storage costs in the TAC/SCM game we consider the cor-
respondence between storage costs and equilibrium outcomes. The aggregate quantity
of day-0 procurement in these stable profiles yields an estimate of the behavior we
would expect to see in actual games. In the TAC/SCM domain, the designer's problem
is to minimize aggregate day-0 procurement. In the notation above, the designer maxi-
mizes W ( s ( θ ) ),definedby I
φ ( s ( θ ))
, where the aggregation function
φ ( s )= i =1 s i , s i is each agent's day-0 procurement choice, and α is a desired cap
on aggregate day-0 procurement.
The full strategy space in TAC/SCM is very complex, but for the purposes of this
analysis we define a restricted space that allows agent to select only day-0 purchase
quantity multiplier. We implemented this strategy space by parameterizing our tour-
nament agent, Deep Maize , with a multiplier on its day-0 requests. 4
{
sup
{
}≤
α
}
Players select a
multiplier from the set
. This strategy space defines the induced
game Γ θ . The payoffs for this game are not directly known, but we can obtain estimates
by simulating games on the TAC server. We must do this for each setting of storage costs
we wish to investigate, and collecting the samples is very time-consuming. Fortunately,
the questions we would like to answer are high-level and we can gather evidence about
them using approximate methods. Instead of requiring exact equilibrium solutions, we
aim to find regions of the profile space that are likely to be stable using the notion of -
Nash equilibrium, where agents cannot gain more than a small benefit by deviating to
a different strategy. We also use two different techniques to approximate sets of stable
profiles without sampling the full profile space.
The first method approximates payoff functions of the game using supervised learn-
ing. We tried three different learning techniques from those introduced in [8]: quadratic
regression (QR), locally weighted average (LWA), and locally weighted linear regres-
sion (LWLR). For quadratic regression, it is possible to directly compute equilibria of
the learned game analytically. For the other methods, we applied replicator dynam-
ics [9] to a discrete approximation of the learned game. The second method uses di-
rected search to find stable profiles. Given a partial game matrix we can compute a
bound on the epsilon for each profile that we have sample data for; this bound is the
maximum benefit for deviating to any profile in the data set. The current set of profiles
with the best -bounds is the set of candidate equilibria . We employed a “best-first”
search that always samples unexplored deviations from a candidate equilibria. The idea
is to confirm or refute the stability of promising individual profiles without requiring the
full game matrix to be sampled. A limitation of this approach is that it cannot rule out
the existence of additional equilibria in the set of profiles that have not been sampled.
We gathered data for storage costs in the set
{
0 , 0 . 3 , 0 . 6 ,..., 1 . 5
}
. An initial data
set was generated by sampling 10 randomly generated profiles for each storage cost
{
0 , 50 , 100 , 150 , 200
}
4 Deep Maize requested a total of 11800 components for each combination of supplier and
product, spread out over different due dates.
Search WWH ::




Custom Search