Download Understanding Agents & Multiagent Systems in Game Theory: Utility, Rationality, & Strategi and more Study notes Mechanical Engineering in PDF only on Docsity! Kagan Tumer, kagan.tumer@oregonstate.edu ME 538 Fall 07: Agents and Multiagent Systems Week 5, Lecture 1: Game theory Kagan Tumer, kagan.tumer@oregonstate.edu • Assume there are two agents, i and j. • Assume agents are self-interested: they have their own preferences over the possible outcomes • Assume Z is the set of those outcomes, or system states: • Preferences of agents captured by utility functions: • Preference ordering: state z is weakly preferred by i over z‘ if • Preference ordering: state z is strongly preferred by i over z‘ if Utilities and Preferences ! Z = {z 1 ,z 2 ,...} ! u i : Z" IR ! u j : Z" IR ! u i (z) > u i (z') ! u i (z) " u i (z') Kagan Tumer, kagan.tumer@oregonstate.edu Utilities and Preferences What is utility? Is utility equal to money? utility money Kagan Tumer, kagan.tumer@oregonstate.edu Player encounters • The “system” where the agents interact: – Agents simultaneously choose action to perform and as a result of that action, an outcome Z will result. – The actual outcome depends on the combination of action – Assume: each agent has just two possible actions C (“cooperate”) and D (“defect”) Kagan Tumer, kagan.tumer@oregonstate.edu Dominant Strategies • Given any particular strategy s (either C or D) agent i, there will be a number of possible outcomes. • We say s1 dominates s2 if every outcome possible by i playing s1 is preferred over every outcome possible by i playing s2 • A rational agent will never play a dominated strategy. • So in deciding what to do, we can delete dominated strategies. • Unfortunately, there isn’t always a unique undominated strategy. Kagan Tumer, kagan.tumer@oregonstate.edu Nash Equilibrium • Two strategies s1 and s2 are in Nash Equilibrium if: – Given that agent i plays s1, agent j cannot do better than s2 – Given that agent j plays s2, agent i cannot do better than s1 • Neither agent has an incentive to deviate from a Nash Equilibrium • However: – Not every game has a pure strategy Nash Equilibrium – Some games have more than one Nash Equilibrium Kagan Tumer, kagan.tumer@oregonstate.edu Pareto Optimality • An outcome is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off. • If an outcome is Pareto optimal, then at least one agent will be reluctant to move away from it (because this agent will be worse off). • If an outcome is not Pareto optimal, then there is another outcome that makes everyone as happy, if not happier. “Reasonable” agents would agree to move to in this case. (Even if I don’t directly benefit from, you can benefit without me suffering.) Kagan Tumer, kagan.tumer@oregonstate.edu Social Welfare • The social welfare of an outcome z, is the sum of the utilities that each agent gets from outcome z • It captures the “total amount of money in the system”. • As a solution concept, may be appropriate when the whole system (all agents) has a single owner (then overall benefit of the system is important, not individuals). – Robot coordination? – Traffic? • Not so much ! G = u i i " (z) Kagan Tumer, kagan.tumer@oregonstate.edu Competitive and zero-sum interactions • Where preferences of agents are in opposition we have strictly competitive scenarios. • Zero-sum encounters are those where utilities sum to a zero: (key feature is that sum is constant. Setting it to zero is for normalization) • Zero sum implies strictly competitive. • Zero sum encounters in real life are very rare . . . but people tend to act in many scenarios as if they were zero sum. ! u i i " (z) = 0 #z $ Z Kagan Tumer, kagan.tumer@oregonstate.edu The Prisoner’s Dilemma • Two criminals are held in separate cells (no communication). • They are told that: (1) One confesses and the other does not: confessor is freed and the other gets 3 years (2) Both confess: each gets 2 years (3) Neither confesses: both get 1 year Kagan Tumer, kagan.tumer@oregonstate.edu Can we Recover Cooperation • Conclusions that some have drawn from this analysis: – the game theory notion of rational action is wrong! – somehow the dilemma is being formulated wrongly • Arguments to recover cooperation: – We are not all machiavelli! – The other prisoner is my twin! – The shadow of the future. . . Kagan Tumer, kagan.tumer@oregonstate.edu The Iterated Prisoner’s Dilemma • One solution: • Play the game more than once – If you know you will be meeting your opponent agent, then the incentive to defect disappears – Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma • Problems arise in n step games (defect at n-1, but then why not at n-2 etc.) • Fortunately, if n is unknown, this effect goes away. Kagan Tumer, kagan.tumer@oregonstate.edu Axelrod’s Tournament • Suppose you play iterated prisoner’s dilemma against a range of opponents . . . • What strategy should you choose, so as to maximize your overall payoff? • Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma. Kagan Tumer, kagan.tumer@oregonstate.edu Strategies in Axelrod’s Tournament • ALLD: “Always defect” — the hawk strategy; • TIT-FOR-TAT: 1. On round t=1, cooperate. 2. On subsequent rounds t, do what opponent did on round t-1 • TESTER: On 1st round, defect. If the opponent retaliated, then play TIT- FOR-TAT. Otherwise intersperse cooperation & defection. • JOSS: As TIT-FOR-TAT, except periodically defect. Kagan Tumer, kagan.tumer@oregonstate.edu Results of Axelrod’s tournament • Overall winner was TIT-FOR-TAT. • Why? Overall Score of a strategy computed as average of performance against all other strategies. – TIT-FOR-TAT was defeated by ALL-D. – However, ALL-D did not do well against all opponents • TIT-FOR-THAT won against “cooperative strategies“ – Again (as in single PD): being not too cooperative pays out. Kagan Tumer, kagan.tumer@oregonstate.edu Recipes for Success • Axelrod suggests the following rules for succeeding in his tournament: – Don’t be envious: Don’t play as if it were zero sum! – Be nice: Start by cooperating, and reciprocate cooperation. – Retaliate appropriately: Always punish defection immediately, but use “measured” force — don’t overdo it. – Don’t hold grudges: Always reciprocate cooperation immediately.