Information Technology Reference
In-Depth Information
bidding strategy that can increase its chances of exploiting the complementarity
gain in utility for the second and following loads won from the same fruitful
region? Furthermore, in Section 4, we randomize the number of available loads
per fruitful region for each epoch to also quantify the impact of a more stochastic
environment.
The focus on complementary values is more fine grained than the seminal work
of [5] and the continuous extension in [4]. In these works, an agent is faced with
the challenge of participating in a number of sequential auctions, with a known,
fixed order of items. An agent only achieves a positive utility if it is able to acquire
one of several bundles completely. Our domain is smoother in the sense that any
bundle of acquired goods is potentially profitable. Furthermore, the agents in
[5] have a limited budget where we measure performance only in terms of mean
profits. Finally, the agents in [5] model the prices experienced in the previous
rounds of auctions. This is less beneficial in our approach due to larger number
of agents and the randomness of the order of the auctions. The agents in [5] also
assume no correlation in the prices between the auctions, an assumption which
we invalidate. However, modeling of the experienced prices can become beneficial
for learning a more sophisticated bidding strategy than the approach we propose
in Section 3. To apply the dynamic programming techniques of [5,12,8] and
approximations of [4] to our domain with an exponential number of possible
bundles and budget considerations is a challenging venue of research.
3
Strategic Bidding Behavior
In Section 2, we have introduced the model of the agents and auctions along
with the complementary properties of the loads that can be won in the auctions.
Here, we discuss how an agent might exploit this property to increase its expected
profits and present a possible policy for a strategic bidder.
We stress that we do not aim to devise the best possible strategy or even to
analyze all counter strategies. In general, formulating the best possible bidding
strategy for this setting is di cult to analyse (for example as Bayes-Nash), let
alone learn. It is however sucient for our purposes that being smarter than
other agents gives an added advantage. We can then show that this leads to
an arms race. We present a more game-theoretical analysis of the best-response
strategy in equilibrium in Section 5. Additionally, in Section 6, we show that
more sophisticated learning algorithms can however have a temporary advantage
before equilibrium outcomes are reached due to first-mover principle.
We restrict our discussion to a setting with at least a moderate number of
agents, i.e
10. We hereby largely preclude the modeling of specific
opponents and detailed analysis of opponent strategies. We focus on relatively
large-scale settings typical for logistics and strongly competitive environments.
This also makes many machine learning approaches intractable due to the large
state space and the exponentially growing number of possible future states that
can be reached. For example, in [7], 4 . 5 billion decision nodes are needed for a five
agent, four item sequential auction with 5 bid choices and random tie breaking
|
Agents
|≥
Search WWH ::




Custom Search