Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Agents & Multiagent Systems in Game Theory: Utility, Rationality, & Strategi, Study notes of Mechanical Engineering

An introduction to game theory, a fundamental concept in agents and multiagent systems. It covers the basics of utility functions, preferences, and rational behavior, as well as strategic form games, payoff matrices, and solution concepts such as dominant strategies, nash equilibrium, pareto optimality, and social welfare. The document also discusses competitive and zero-sum interactions and real-life examples like the prisoner's dilemma.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-9ys-1
koofers-user-9ys-1 🇺🇸

10 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Agents & Multiagent Systems in Game Theory: Utility, Rationality, & Strategi and more Study notes Mechanical Engineering in PDF only on Docsity! Kagan Tumer, kagan.tumer@oregonstate.edu ME 538 Fall 07: Agents and Multiagent Systems Week 5, Lecture 1: Game theory Kagan Tumer, kagan.tumer@oregonstate.edu • Assume there are two agents, i and j. • Assume agents are self-interested: they have their own preferences over the possible outcomes • Assume Z is the set of those outcomes, or system states: • Preferences of agents captured by utility functions: • Preference ordering: state z is weakly preferred by i over z‘ if • Preference ordering: state z is strongly preferred by i over z‘ if Utilities and Preferences ! Z = {z 1 ,z 2 ,...} ! u i : Z" IR ! u j : Z" IR ! u i (z) > u i (z') ! u i (z) " u i (z') Kagan Tumer, kagan.tumer@oregonstate.edu Utilities and Preferences What is utility? Is utility equal to money? utility money Kagan Tumer, kagan.tumer@oregonstate.edu Player encounters • The “system” where the agents interact: – Agents simultaneously choose action to perform and as a result of that action, an outcome Z will result. – The actual outcome depends on the combination of action – Assume: each agent has just two possible actions C (“cooperate”) and D (“defect”) Kagan Tumer, kagan.tumer@oregonstate.edu Dominant Strategies • Given any particular strategy s (either C or D) agent i, there will be a number of possible outcomes. • We say s1 dominates s2 if every outcome possible by i playing s1 is preferred over every outcome possible by i playing s2 • A rational agent will never play a dominated strategy. • So in deciding what to do, we can delete dominated strategies. • Unfortunately, there isn’t always a unique undominated strategy. Kagan Tumer, kagan.tumer@oregonstate.edu Nash Equilibrium • Two strategies s1 and s2 are in Nash Equilibrium if: – Given that agent i plays s1, agent j cannot do better than s2 – Given that agent j plays s2, agent i cannot do better than s1 • Neither agent has an incentive to deviate from a Nash Equilibrium • However: – Not every game has a pure strategy Nash Equilibrium – Some games have more than one Nash Equilibrium Kagan Tumer, kagan.tumer@oregonstate.edu Pareto Optimality • An outcome is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off. • If an outcome is Pareto optimal, then at least one agent will be reluctant to move away from it (because this agent will be worse off). • If an outcome is not Pareto optimal, then there is another outcome that makes everyone as happy, if not happier. “Reasonable” agents would agree to move to in this case. (Even if I don’t directly benefit from, you can benefit without me suffering.) Kagan Tumer, kagan.tumer@oregonstate.edu Social Welfare • The social welfare of an outcome z, is the sum of the utilities that each agent gets from outcome z • It captures the “total amount of money in the system”. • As a solution concept, may be appropriate when the whole system (all agents) has a single owner (then overall benefit of the system is important, not individuals). – Robot coordination? – Traffic? • Not so much ! G = u i i " (z) Kagan Tumer, kagan.tumer@oregonstate.edu Competitive and zero-sum interactions • Where preferences of agents are in opposition we have strictly competitive scenarios. • Zero-sum encounters are those where utilities sum to a zero: (key feature is that sum is constant. Setting it to zero is for normalization) • Zero sum implies strictly competitive. • Zero sum encounters in real life are very rare . . . but people tend to act in many scenarios as if they were zero sum. ! u i i " (z) = 0 #z $ Z Kagan Tumer, kagan.tumer@oregonstate.edu The Prisoner’s Dilemma • Two criminals are held in separate cells (no communication). • They are told that: (1) One confesses and the other does not: confessor is freed and the other gets 3 years (2) Both confess: each gets 2 years (3) Neither confesses: both get 1 year Kagan Tumer, kagan.tumer@oregonstate.edu Can we Recover Cooperation • Conclusions that some have drawn from this analysis: – the game theory notion of rational action is wrong! – somehow the dilemma is being formulated wrongly • Arguments to recover cooperation: – We are not all machiavelli! – The other prisoner is my twin! – The shadow of the future. . . Kagan Tumer, kagan.tumer@oregonstate.edu The Iterated Prisoner’s Dilemma • One solution: • Play the game more than once – If you know you will be meeting your opponent agent, then the incentive to defect disappears – Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma • Problems arise in n step games (defect at n-1, but then why not at n-2 etc.) • Fortunately, if n is unknown, this effect goes away. Kagan Tumer, kagan.tumer@oregonstate.edu Axelrod’s Tournament • Suppose you play iterated prisoner’s dilemma against a range of opponents . . . • What strategy should you choose, so as to maximize your overall payoff? • Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma. Kagan Tumer, kagan.tumer@oregonstate.edu Strategies in Axelrod’s Tournament • ALLD: “Always defect” — the hawk strategy; • TIT-FOR-TAT: 1. On round t=1, cooperate. 2. On subsequent rounds t, do what opponent did on round t-1 • TESTER: On 1st round, defect. If the opponent retaliated, then play TIT- FOR-TAT. Otherwise intersperse cooperation & defection. • JOSS: As TIT-FOR-TAT, except periodically defect. Kagan Tumer, kagan.tumer@oregonstate.edu Results of Axelrod’s tournament • Overall winner was TIT-FOR-TAT. • Why? Overall Score of a strategy computed as average of performance against all other strategies. – TIT-FOR-TAT was defeated by ALL-D. – However, ALL-D did not do well against all opponents • TIT-FOR-THAT won against “cooperative strategies“ – Again (as in single PD): being not too cooperative pays out. Kagan Tumer, kagan.tumer@oregonstate.edu Recipes for Success • Axelrod suggests the following rules for succeeding in his tournament: – Don’t be envious: Don’t play as if it were zero sum! – Be nice: Start by cooperating, and reciprocate cooperation. – Retaliate appropriately: Always punish defection immediately, but use “measured” force — don’t overdo it. – Don’t hold grudges: Always reciprocate cooperation immediately.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved