Download Multiagent Systems: Lecture 6 - Utility, Preferences, and Multiagent Encounters and more Study notes Electrical and Electronics Engineering in PDF only on Docsity! LE C T U R E 6: M U LT IA G E N T IN T E R A C T IO N S A n Introduction to M ultiagentS ystem s h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / Lecture 6 A n Introduction to M ultiagentS ystem s 1 W hatare M ultiagentS ystem s? E nvironm ent sphere of influence K E Y agent interaction organisational relationship h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 1 Lecture 6 A n Introduction to M ultiagentS ystem s W hatis U tility? U tility is notm oney (butitis a usefulanalogy). Typicalrelationship betw een utility & m oney: u tility m o n ey h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 4 Lecture 6 A n Introduction to M ultiagentS ystem s 3 M ultiagentE ncounters W e need a m odelofthe environm entin w hich these agents w ill act... – agents sim ultaneously choose an action to perform ,and as a resultofthe actions they select,an outcom e in w illresult; – the actualoutcom e depends on the com bination ofactions; – assum e each agenthas justtw o possible actions thatitcan perform C (“cooperate”) and “D ” (“defect”). E nvironm entbehaviour given by state transform er function: A c agenti’s action A c agent j’s action h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 5 Lecture 6 A n Introduction to M ultiagentS ystem s H ere is a state transform er function: D D D C C D C C (T his environm entis sensitive to actions ofboth agents.) H ere is another: D D D C C D C C (N either agenthas any influence in this environm ent.) A nd here is another: D D D C C D C C (T his environm entis controlled by j.) h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 6 Lecture 6 A n Introduction to M ultiagentS ystem s D om inantS trategies G iven any particular strategy s (either C or D ) agenti,there w ill be a num ber ofpossible outcom es. W e say s dom inates s ifevery outcom e possible by iplaying s is preferred over every outcom e possible by iplaying s . A rationalagentw illnever play a dom inated strategy. S o in deciding w hatto do,w e can delete dom inated strategies. U nfortunately,there isn’talw ays a unique undom inated strategy. h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 9 Lecture 6 A n Introduction to M ultiagentS ystem s N ash E quilibrium In general,w e w illsay thattw o strategies s and s are in N ash equilibrium if: 1. under the assum ption thatagent iplays s ,agent jcan do no better than play s ;and 2. under the assum ption thatagent jplays s ,agent ican do no better than play s . N either agenthas any incentive to deviate from a N ash equilibrium . U nfortunately: 1. N otevery interaction scenario has a N ash equilibrium . 2. S om e interaction scenarios have m ore than one N ash equilibrium . h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 10 Lecture 6 A n Introduction to M ultiagentS ystem s C om petitive and Z ero-S um Interactions W here preferences ofagents are diam etrically opposed w e have strictly com petitive scenarios. Z ero-sum encounters are those w here utilities sum to zero: u i #" u j $ for all % Z ero sum im plies strictly com petitive. Z ero sum encounters in reallife are very rare ... butpeople tend to actin m any scenarios as ifthey w ere zero sum . h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 11 Lecture 6 A n Introduction to M ultiagentS ystem s T he individualrationalaction is defect. T his guarantees a payoffofno w orse than 2,w hereas cooperating guarantees a payoffofatm ost1. S o defection is the bestresponse to allpossible strategies: both agents defect,and getpayoff= 2. B utintuition says this is notthe bestoutcom e: S urely they should both cooperate and each getpayoffof3! h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 14 Lecture 6 A n Introduction to M ultiagentS ystem s T his apparentparadox is the fundam entalproblem ofm ulti-agent interactions. Itappears to im ply thatcooperation w illnotoccur in societies of self-interested agents. R ealw orld exam ples: – nuclear arm s reduction (“w hy don’tIkeep m ine...”) – free rider system s — public transport; – in the U K — television licenses. T he prisoner’s dilem m a is ubiquitous. C an w e recover cooperation? h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 15 Lecture 6 A n Introduction to M ultiagentS ystem s A rgum ents for R ecovering C ooperation C onclusions thatsom e have draw n from this analysis: – the gam e theory notion ofrationalaction is w rong! – som ehow the dilem m a is being form ulated w rongly A rgum ents to recover cooperation: – W e are notallm achiavelli! – T he other prisoner is m y tw in! – T he shadow ofthe future... h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 16 Lecture 6 A n Introduction to M ultiagentS ystem s 4.3 A xelrod’s Tournam ent S uppose you play iterated prisoner’s dilem m a againsta range of opponents ... W hatstrategy should you choose,so as to m axim ise your overall payoff? A xelrod (1984) investigated this problem ,w ith a com puter tournam entfor program s playing the prisoner’s dilem m a. h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 19 Lecture 6 A n Introduction to M ultiagentS ystem s S trategies in A xelrod’s Tournam ent A LLD : “A lw ays defect” — the haw k strategy; T IT-F O R -TAT : 1. O n round u $ ,cooperate. 2. O n round u $ ,do w hatyour opponentdid on round u & . T E S T E R : O n 1stround,defect. Ifthe opponentretaliated,then play T IT-F O R -TAT.O therw ise intersperse cooperation & defection. JO S S : A s T IT-F O R -TAT,exceptperiodically defect. h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 20 Lecture 6 A n Introduction to M ultiagentS ystem s R ecipes for S uccess in A xelrod’s Tournam ent A xelrod suggests the follow ing rules for succeeding in his tournam ent: D on’tbe envious: D on’tplay as ifitw ere zero sum ! B e nice: S tartby cooperating,and reciprocate cooperation. R etaliate appropriately: A lw ays punish defection im m ediately,butuse “m easured” force — don’toverdo it. D on’thold grudges: A lw ays reciprocate cooperation im m ediately. h t t p : / / w w w . c s c . l i v . a c . u k / ˜ m j w / p u b s / i m a s / 21