Information Technology Reference
In-Depth Information
technique for grouping states (Soft State Aggregation, or SSA) which allow them
to represent an infinite space of continuous values, with D dimensions, in a
D -dimensional finite space (clusters) [17]. For the study case the environment
state space is seen as a
D -dimensional matrix, grouping the states (
R D )in
an exponential way. The discrete position (or index of the cluster)
for the
continuous variable x is computed as:
i =
min( log 2 ( x +1) )
Bounding i on some number ( ω ) which is to be considered near to the maximum
that is to be seen for that variable, for D dimensions that are taken into account
for the environment representation. Therefore the total set of states which every
agent must represent is ω D ×|A|
. The number of actions
is known by every
agent (although it may change with time).
Two different types of agents will be differentiated; those who can receive and
understand social opinions -executing Social-Welfare RL- and those which not
-using standard Q-learning.
Learning of the Q -table was done individually for both the social-aware agents
and the non-social. Parameter ω was fixed at 5, which means that the last state
for each dimension will represent values of
x ∈
[15 , +
[. The dimensions (or
variables) taken into account to represent the environment are the number of
products of each kind in the market plus agent's life and balance .
Three types of products where loaded: Wheat , with no dependencies and need-
ing 1 cycle. Flour , requiring 2 units of flour and 2 cycles. And Bread , eatable,
providing 10 units of life, requiring 2 units of flour and 2 cycles.
The actions available to the agents are the creation of any kind of the products,
plus another one called eating . The total space of representation needed by each
agent is only of (5) (3+2)
4 = 12500 states. Learning stage has taken 1 , 000 , 000
cycles, with a probability of exploration k = 1, hopping to explore as many
states as possible, as many times they could. Every agent starts with exactly
the same Q -matrix at the beginning of the simulation. The experiment includes
three different scenarios, thus there are two significant changes. The scenarios
1. No changes in the environment.
2. Relaxing the production rules: Flour needs 1 unit of time and 1 unit of
Wheat ; Bread needs 1 unit of time and 1 unit of Flour .
3. Hardening the production rules: Wheat needs 2 units of time; Flour needs
2 units of time and 3 units of Wheat ; Bread stays as originally, needing 2
units of time and 2 units of Flour .
As a remark, it is important to control the value of κ such that never reaches
0. For that matter, the computation of
is corrected by a
representable by a machine so that ≤ κ ≤
It is expected to observe a better adaptation of the social-aware agents com-
pared to the non-social ones which use traditional reinforcement learning (SSA
Search WWH ::

Custom Search