Information Technology Reference
In-Depth Information
the greedy action (i.e., the action with the highest Q ( s, a )
value) with probability
1 ,
and one of the other possible actions with probability .
As reward function r ( s, a )
we use:
r ( s, a )= K − ω
(19)
where ω is the absolute difference between price and cost, defined in Eq. 16, and K is
a constant such that K ≥ ω .
Figure 2(a) plots the reward that on average each agent receives after every learning
iteration 2 . It is possible to appreciate how each agent converges to a reward quite close
to the highest possible reward after 2000 iterations.
Figure 2(b) plots the evolution of the price of the two links. After roughly 2000
iteration, the price of link 1 and 2 settles around
respectively. The
dynamics of the prices affect the road users' route choice. At the unpriced equilibrium,
roughly 70 road users select link 1, and the remaining 30 select link 2. The evolution of
prices forces a new equilibrium, where roughly 60 road users select link 1 and 40 road
users select link 2.
This new equilibrium of course affects the total travel time cost paid by the whole
population of road users (see Figure 2(c)). Again, it is possible to appreciate how the
competitive market equilibrium minimises the social transportation cost, which very
closely approaches the minimum social cost.
0 . 15e
and
0 . 1e
5
Conclusions
In this paper, we proposed the application of an artificial market for the efficient allo-
cation of a road transport network. We modelled each network portion as a competitive
market agent that produces mobility. By defining appropriate production price selection
strategies, the outcome of the market turned out to be aligned with the minimisation
of the social transportation cost. This fact has been demonstrated both analytically and
experimentally: the distributed and independent price selection was equivalent, from a
social welfare point of view, to the optimal pricing performed by an omniscient, cen-
tralised, regulator.
For tractability reasons, we evaluated the artificial market with a simple two-link
problem, in order to compare the solution reached by the two learning-based market
agents with the optimal solution that we analytically derived. As future work, we need
to model a more complex scenario, with several possible routes to choose from, and a
traffic demand that varies with the time of the day.
Furthermore, this work could be evaluated using a traffic simulator to compute the
resulting traffic assignment, rather than computing the equilibrium assignment accord-
ing to Wardrop's first principle. In this way it is possible to model more precisely the
route choice of each individual road user.
2
A learning iteration is a single update of the Q ( s, a ) function.
Search WWH ::




Custom Search