Information Technology Reference
In-Depth Information
remainder of this article. First, we will show that the engineered reward-based
approach has the same computational complexity and expressive power as the
state of the art approach in fully observable games, i.e. in MDPs and their multi-
agent counterparts named stochastic games (SG). Having this in mind, we then
will consider partially observable domains. Engineered reward signals in such
domains augment the expressive power compared to ordinary partially observ-
able models. However, time complexity remains and constructing an engineered
reward signal is not forbidden in state of the art models. From this it follows that
one could easily—accidentally or intentionally—supply agents with knowledge
they are not supposed to obtain, e.g. on states or joint actions. Based on this
observation, we then propose to add an additional requirement on reward sig-
nals in existing models that formally prevents such exploitation. The engineered
reward-based model in this work, thus, should mainly be understood as a means
to investigate the power of global rewards. However, as described in the context
of an example in Sect. 6, it is also usefulinproblemswithaspecial structure.
A Polynomial Time Coding Approach. A general encoding function enc
is presented in Alg. 1. Its basic idea is to create a string code ( s )# code ( r )that
combines codes of state s and reward r separated by a special character (#).
This string is then converted to a bit string, whereas each character is encoded
with the same number of bits b (e.g. b = 8 bits as in ASCII encoding). A single '1'
is added at the head of the resulting bit string for decoding reasons as explained
below. The resulting bit string is then interpreted as a number. Note that the
constructed bit strings are bounded, since rewards and state representations
are bounded, too. This approach requires time linear in the length of the bit
string. Note that states may also contain information about executed (joint)
actions. A corresponding decoding function dec is straightforward. The idea is
to interpret a received reward as bit string. By construction, the string starts
with a '1', which is removed. The remaining bit string, that might start with
zeros now, is split into substrings, each of length b ,where b equals the value
of b in the encoding procedure (e.g. b = 8 for ASCII encoding, cf. Alg. 1, line
7). Then, these substrings can be interpreted as characters, from which we can
Algorithm 1. (Encoding algorithm)
Input: state s ∈S ,reward r ∈ R
Output: engineered reward r ∈ R
1: procedure Encoding (
s, r
)
2:
cs
convert state
s
to character array (
cs 1 ,cs 2 ,...,cs n )
3:
cr convert reward r to character array ( cr 1 ,cr 2 ,...,cr m )
4:
codeword Concatenate ( cs , # , cr )
create code word code ( s )# code ( r )
5:
bitstream 1
initialize new bit stream with 1
6:
for i =1 ,...,| codeword | do
7:
bitstream Concatenate ( bitstream , bitcode ( codeword [ i ] ,b ))
use b bits
to encode codeword [ i ]
8:
return ToNumber ( bitstream )
interpret bitstream as number and return it
Search WWH ::




Custom Search