Learning Nash Equilibria in Non-Cooperative Games (Artificial Intelligence)

INTRODUCTION

Game Theory (Von Neumann & Morgenstern, 1944) is a branch of applied mathematics and economics that studies situations (games) where self-interested interacting players act for maximizing their returns; therefore, the return of each player depends on his behaviour and on the behaviours of the other players. Game Theory, which plays an important role in the social and political sciences, has recently drawn attention in new academic fields which go from algorithmic mechanism design to cybernetics. However, a fundamental problem to solve for effectively applying Game Theory in real word applications is the definition of well-founded solution concepts of a game and the design of efficient algorithms for their computation.

A widely accepted solution concept of a game in which any cooperation among the players must be self-enforcing (non-cooperative game) is represented by the Nash Equilibrium. In particular, a Nash Equilibrium is a set of strategies, one for each player of the game, such that no player can benefit by changing his strategy unilaterally, i.e. while the other players keep their strategies unchanged (Nash, 1951). The problem of computing Nash Equilibria in non-cooperative games is considered one of the most important open problem in Complexity Theory (Papadimitriou, 2001). Daska-lakis, Goldbergy, and Papadimitriou (2005), showed that the problem of computing a Nash equilibrium in a game with four or more players is complete for the complexity class PPAD-Polynomial Parity Argument Directed version (Papadimitriou, 1991), moreover, Chen and Deng extended this result for 2-player games (Chen & Deng, 2005). However, even in the two players case, the best algorithm known has an exponential worst-case running time (Savani & von Stengel, 2004); furthermore, if the computation of equilibria with simple additional properties is required, the problem immediately becomes NP-hard (Bonifaci, Di Iorio, & Laura, 2005) (Conitzer & Sandholm, 2003) (Gilboa & Zemel, 1989) (Gottlob, Greco, & Scarcello, 2003).


Motivated by these results, recent studies have dealt with the problem of efficiently computing Nash Equilibria by exploiting approaches based on the concepts of learning and evolution (Fudenberg & Levine, 1998) (Maynard Smith, 1982). In these approaches the Nash Equilibria of a game are not statically computed but are the result of the evolution of a system composed by agents playing the game. In particular, each agent after different rounds will learn to play a strategy that, under the hypothesis of agent’s rationality, will be one of the Nash equilibria of the game (Benaim & Hirsch, 1999) (Carmel & Markovitch, 1996).

This article presents SALENE, a Multi-Agent System (MAS) for learning Nash Equilibria in non-cooperative games, which is based on the above mentioned concepts.


BACKGROUND

An n-person strategic game G can be defined as a tuple G = (N; (A1)^ (r)eA), where N = {1, 2, … , n} is the set of players, A1′ is a finite set of actions for player ieN, and r1 : A1 x … x An ^ ^ is the payoff function of player i. The set Ai is called also the set of pure strategies of player i. The Cartesian product xfiN A1′ = A1 x … x An can be denoted by A and r : A ^ can denote the vector valued function whose ith component is ri, i.e., r(a) = (r’(a), … , rn(a)), so it is possible to write (N, A, r) for short for (N; (A^ (r) ^

For any finite set Ai the set of all probability distributions on A1′ can be denoted by A(A’). An element a’ e A(A’) is a mixed strategy for player i.

A (Nash) equilibrium of a strategic game G = (N, A, r) is an N-tuple of (mixed) strategies a = (a1) ieN, a’ e A(A’), such that for every i e Nand any other strategy of player i, t’ e A(A’), ri(xi,a-i) < ri(ai,a-i), where r denotes also the expected payoff to player i in the mixed extension of the game and a-i represents the mixed strategies in a of all the other players. Basically, supposing that all the other players do not change their strategies it is not possible for any player i to play a different strategy t1 able to gain a better payoff of that gained by playing a1. a1 is called a Nash equilibrium strategy for player i.

In 1951 J. F. Nash proved that a strategic (non-cooperative) game G = (N, A, r) has at least a (Nash) equilibrium a (Nash, 1951); in his honour, the computational problem of finding such equilibria is known as NASH (Papadimitriou, 1994).

SOFTWARE AGENTS FOR LEARNING NASH EQUILIBRIA

SALENE was conceived as a system for learning at least one Nash Equilibrium of a non-cooperative game given in the form G = (N; (A1)1eN; (r)iEN). In particular, the system asks the user for:

• the number n of the players which defines the set of players N = {1, 2, … , n};

• for each player ieN, the related finite set of pure strategies A1 and his payoff function r1 : A1 x … x An ^

• the number k of times the players will play the game.

Figure 1. The class diagram of SALENE

The class diagram of SALENE

Then, the system creates n agents, one associated to each player, and a referee. The agents will play the game G k times, after each match, each agent will decide the strategy to play in the next match to maximise his expected utility on the basis of his beliefs about the strategies that the other agents are adopting. By analyzing the behaviour of each agent in all the k matches of the game, SALENE presents to the user an estimate of a Nash Equilibrium ofthe game. The Agent paradigm has represented a “natural” way of modelling and implementing the proposed solution as it is characterized by several interacting autonomous entities (players) which try to achieve their goals (consisting in maximising their returns).

The class diagram of SALENE is shown in Figure 1.

The Manager Agent interacts with the user and it is responsible for the global behaviour of the system. In particular, after having obtained from the user the input parameters G and k, the Manager Agent creates both n PlayerAgents and a Referee Agent that coordinates and monitors the behaviours of the players. The Manager Agent sends to all the agents the definition G of the game then he asks the Referee Agent to orchestrate k matches of the game G. In each match, the Referee Agent asks each Player Agent which pure strategy he has decided to play, then, after having acquired the strategies from all players, the Referee Agent communicates to each Player Agent both the strategies played and the payoffs gained by all players. After playing k matches of the game G the Referee Agent communicates all the data about the played matches to the Manager Agent which analyses it and properly presents the obtained results to the user.

A Player Agent is a rational player that, given the game definition G, acts to maximise his expected utility in each single match of G without considering the overall utility that he could obtain in a set of matches. In particular the behaviour of the Player Agent i can be described by the following main steps:

1. In the first match the Player Agent i chooses to play a pure strategy randomly generated considering all the pure strategies playable with the same probability: if |Ai|=m the probability of choosing a pure strategy seAi is 1/m;

2. The Player Agent i waits for the Referee Agent to ask him which strategy he wants to play, then he communicates to the Referee Agent the chosen pure strategy as computed in step 1 if he is playing his first match or in step 4 otherwise;

3. The Player Agent waits for the Referee Agent to communicate him both the pure strategies played and the payoffs gained by all players;

4. The Player Agent decides the mixed strategy to play in the next match. In particular, the Player Agent updates the beliefs about the mixed strategies currently adopted by the other players and consequently recalculate the strategy able to maximise his expected utility. Basically, the Player Agent itries to find the strategy a1 e A(A1), such that for any other strategy t1 e A(A1), ri(T1,a-i) < ri(ai,a-i) where r1 denotes his expected payoff and a-i represents his beliefs about the mixed strategies currently adopted by all the other players, i.e. a–(a) , a e A(A). In order to evaluate a for each other player j^i the Player Agent i considers the pure strategies played by the player j in all the previous matches and computes the frequency of each pure strategy, this frequency distribution will be the estimate for aj. If there is at least an element in the actually computed set a–(aO N that differs from the set a-i as computed in the previous match, the Player Agent i solves the inequality r1(Ti,a-i) < ri(a1,a-i) that is equivalent to solve the optimization problem P={max(ri(a1,a-1)), aie A(A1)}. It is worth noting that P is a linear optimization problem, actually, given the set a-i, r1(ai,a-i) is a linear objective function in a1 (see the game definition reported in the Background Section), and with |Ai|=m aieA(A1) is a vector % e such that xs=1 and for every se M %s>0, so the constraint aie A(A1) is a set of m+1 linear inequalities. P is solved by the Player Agent by using an efficient method for solving problems in linear programming, in particular the predictor-corrector method of Mehrotra (1992), whose complexity is polynomial for both average and worst case. The obtained solution for a1 is a pure strategy because it is one of the vertices of the polytope which defines the feasible region for P. The obtained strategy ai will be played by the Player Agent i in the next match; r1(ai,a-i) represents the expected payoff to player i in the next match;

5. back to step 2.

It is worth noting that a Player Agent for choosing the mixed strategy to play in each match of G does not need to known the payoff functions of the others players, in fact, for solving the optimization problem P it only needs to consider the strategies which have been played by the other players in all the previous matches.

The Manager Agent, receives from the Referee Agent all the data about the k matches of the game G and computes an estimate of a Nash Equilibrium of G, i.e. an N-tuple a=(a1)ieN, aie A(A1). In particular, in order to estimate ai (the Nash equilibrium strategy of the player i), the Manager Agent computes, on the basis of the pure strategies played by the player i in each of the k match, the frequency of each pure strategy: this frequency distribution will be the estimate for ai. The so computed set a=(a’).eN, aieA(A1) will be then properly proposed to the user together with the data exploited for its estimation.

SALENE has been implemented using JADE (Bellifemine, Poggi, & Rimassa, 2001), a software framework allowing for the development of multi-agent systems and applications conforming to FIPA standards (FIPA, 2006), and tested on different games that differ from each other both in the number and in the kind of Nash Equilibria. The experiments have demonstrated that:

• if the game has p>=1 Pure Nash Equilibria and s>=0 Mixed Nash Equilibria the agents converge in playing one of the p Pure Nash Equilibria; in these cases, as the behaviour of each Player Agent converges with probability one to a Nash Equilibrium of the game, the learning process converges in behaviours to equilibrium (Foster & Young, 2003);

• if the game has only Mixed Nash Equilibria, while the behaviour of the Player Agents does not converge to an equilibrium, the time-average behaviour, i.e. the empirical frequency with which each player chooses his strategy, may converge to one of the mixed Nash Equilibria of the game; that is the learning process may converge in time average to equilibrium (Foster and Young, 2003).

In the next Section the main aspects related to the convergence properties of the approach/algorithm exploited by the SALENE agents for leaning Nash Equilibria are discussed in a more general discussion about current and future research efforts.

FUTURE TRENDS

Innovative approaches, as SALENE, based on the concepts of learning and evolution have shown great potential for modelling and efficiently solving non-cooperative games. However, as the solutions ofthe games (e.g. Nash Equilibria) are not statically computed but are the result of the evolution of a system composed by interacting agents, there are several open problems mainly related to the accuracy of the provided solution that need to be tackled to allow these approaches to be widely exploited in concrete business application.

The approach exploited in SALENE, which derives from the Fictitious Play (Robinson, 1951) approach, efficiently solves the problem of learning a Nash Equilibrium in non-cooperative games which have at least one Pure Nash Equilibrium: in such a case the behaviour of the players exactly converges to one of the Pure Nash Equilibria of the game (convergence in behaviours to equilibrium). On the contrary, ifthe game has only Mixed Nash Equilibria, the convergence ofthe learning algorithm is not ensured. Computing ex ante when this case happens is quite costly as it requires to solve the following problem: “Determining whether a strategic game has only Mixed Nash Equilibria”, which is equivalent to: “Determining whether a strategic game does not have any Pure Nash Equilibria”. This problem is Co-NP complete as its complement “Determining whether a strategic game has a Pure Nash Equilibrium” is NP complete (Gottlob, Greco, & Scarcello, 2003). As witnessed by the conducted experiments, when a game has only Mixed Nash Equilibria there are still some cases in which, while the behaviour of the players does not converge to an equilibrium, the time-average behaviour, i.e. the empirical frequency with which each player chooses his strategy, converges to one of the Mixed Nash Equilibria of the game (convergence in time average to equilibrium).

Nevertheless, there are some cases in which there is neither convergence in behaviour neither convergence in time average to equilibrium; an example of such a case is the fashion game of Shapley (1964).An important open problem is then represented by the characterization of the classes of games for which the learning algorithm adopted in SALENE converges; more specifically, the classes of games for which the algorithm: (a) convergences in behaviours to equilibrium (which implies the convergence in time average to equilibrium), (b) only convergences in time average to equilibrium; (c) does not converge neither in behaviours neither in time average. Currently, it has been demonstrated that the algorithm converges in behaviours or in time average to equilibrium for the following classes of games:

• zero-sum games (Robinson, 1951);

• games which are solvable by iterated elimination of strictly dominated strategies (Nachbar, 1990);

• potential games (Monderer & Shapley, 1996);

• 2xN games, i.e. games with 2 players, 2 strategies for one player and N strategies for the other player (Berger, 2005).

Future efforts will be geared towards: (i) completing the characterization of the classes of games for which the learning algorithm adopted in SALENE converges and evaluating the complexity of solving the membership problem for such a classes; (ii) evaluating different learning algorithms and their convergence properties; (ii) letting the user ask for the computation of Nash Equilibria with simple additional properties.

More in general, a wide adoption of the emerging agent-based approaches for solving games which model concrete business applications will depend on the accuracy and the convergence properties of the provided solutions; both aspects still need to be fully investigated.

CONCLUSION

The complexity of NASH, the problem consisting in computing Nash Equilibria in non-cooperative games, is still debated, but even in the two players case, the best known algorithm has an exponential worst-case running time. SALENE, the proposed MAS for learning Nash Equilibria in non-cooperative games, can be conceived as a heuristic and efficient method for computing at least one Nash Equilibria in a non-cooperative game represented in its normal form; actually, the learning algorithm adopted by the Player Agents has a polynomial running time for both average and worst case. SALENE can be then fruitfully exploited for efficiently solving non-cooperative games which model interesting concrete problems ranging from classical economic and finance problems to the emerging ones related to the economic aspects of the Internet such as TCP/IP congestion, selfish routing, and algorithmic mechanism design.

key terms

Computational Complexity Theory: A branch of the theory of computation in computer science which studies how the running time and the memory requirements of an algorithm increase as the size of the input to the algorithm increases.

Game Theory: A branch of applied mathematics and economics that studies situations (games) where self-interested interacting players act for maximizing their returns.

Heuristic: In computer science, a technique designed to solve a problem which allows for gaining computational performance or conceptual simplicity potentially at the cost of accuracy and/or precision of the provided solutions to the problem itself.

Nash Equilibrium: A solution concept of a game where no player can benefit by changing his strategy unilaterally, i.e. while the other players keep theirs unchanged; this set of strategies and the corresponding payoffs constitute a Nash Equilibrium of the game.

NP-Hard Problems: Problems that are intrinsically harder than those that can be solved by a nondetermin-istic Turing machine in polynomial time.

Non-Cooperative Games: A game in which any cooperation among the players must be self-enforcing.

Payoffs: Numeric representations of the utility obtainable by a player in the different outcomes of a game.

Next post:

Previous post: