Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Introduction to Artificial Intelligence, Summaries of Material Engineering

Multimedia University Material Engineering

Introduction to Artificial Intelligence

Typology: Summaries

2020/2021

Uploaded on 03/20/2022

nguyen-trung-nhan-b2012237 🇻🇳

(1)

1 document

1 / 77

Partial preview of the text

Download Introduction to Artificial Intelligence and more Summaries Material Engineering in PDF only on Docsity! Set 4: Game-Playing ICS 271 Fall 2016 Kalev Kask Overview • Computer programs that play 2-player games – game-playing as search – with the complication of an opponent • General principles of game-playing and search – game tree – minimax principle; impractical, but theoretical basis for analysis – evaluation functions; cutting off search; replace terminal leaf utility fn with eval fn – alpha-beta-pruning – heuristic techniques – games with chance • Status of Game-Playing Systems – in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players. • Motivation: multiagent competitive environments – think of “nature” as an opponent – economics, war-gaming, medical drug treatment Solving 2-player Games • Two players, fully observable environments, deterministic, turn-taking, zero-sum games of perfect information • Examples: e.g., chess, checkers, tic-tac-toe • Configuration of the board = unique arrangement of “pieces” • Statement of Game as a Search Problem: – States = board configurations – Operators = legal moves. The transition model – Initial State = current configuration – Goal = winning configuration – payoff function (utility)= gives numerical value of outcome of the game • Two players, MIN and MAX taking turns. MIN/MAX will use search tree to find next move • A working example: Grundy's game – Given a set of coins, a player takes a set and divides it into two unequal sets. The player who cannot do uneven split, looses. – What is a state? Moves? Goal? Grundy’s game - special case of nim MIN MAX Figure 4.14 Exhaustive minimax for the game of nim. ‘ Bold lines indicate forced win for MAX, Each node is marked with its derived value (0 or 1) under minimax. Game Trees: Tic-tac-toe How do we search this tree to find the optimal move? Two-Ply Game Tree DADS Two-Ply Game Tree MAX MIN Two-Ply Game Tree The minimax decision Minimax maximizes the utility for the worst-case outcome for max A solution tree is highlighted Static (Heuristic) Evaluation Functions • An Evaluation Function: – Estimates how good the current board configuration is for a player – Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the player – Othello: Number of white pieces - Number of black pieces – Chess: Value of all white pieces - Value of all black pieces • Typical values from -infinity (loss) to +infinity (win) or [-1, +1]. • If the board evaluation is X for a player, it’s -X for the opponent • Example: – Evaluating chess boards – Checkers – Tic-tac-toe Applying MiniMax to tic-tac-toe • The static evaluation function heuristic Backup Values Q@) Start node MAX's move er 41 @ @ O 4 ec O ‘e} 0 6-5=1 5-520 6-5=1 5-5=04-5=-1 5-4=1 6-4=2 a IO) lO x Xx e) e 5§-6=-15-5=05-6=-16-6=0 4-6=-2 Figure 4.17 _Two-ply minimax applied to the opening move of tic-tac-toe, Feature-based evaluation functions • Features of the state • Features taken together define categories (equivalence) classes • Expected value for each equivalence class – Too hard to compute • Instead – Evaluation function = weighted linear combination of feature values jue ° (a) White to move (b) White to move Figure 5.8 Two chess positions that differ only in the position of the rook at lower right. In (a), Black has an advantage of a knight and two pawns, which should be enough to win the game. In (b), White will capture the queen, giving it an advantage that should be strong enough to win. For chess, typically /inear weighted sum of features Eval(s) = wy f(s) + wofols) +... + watn(s) é.g., Ww; = 9 with f(s) = (number of white queens) — (number of black queens), etc. Chapter 5, Sections 1-514 Digression: Exact values don’t matter MAX MIN \ A 20 4 1 0 20 400 Behaviour is preserved under any monotonic transformation of EVAL Only the order matters: payoff in deterministic games acts as an ordinal utility function Alpha Beta Procedure • Idea: – Do depth first search to generate partial game tree, – Give static evaluation function to leaves, – Compute bound on internal nodes. • ,  bounds: –  value for max node means that max real value is at least . –  for min node means that min can guarantee a value no more than . • Computation: – Pass current / down to children when expanding a node – Update (Max)/(Min) when node values are updated •  of MAX node is the max of children seen. •  of MIN node is the min of children seen. Alpha-Beta Example [-∞, +∞] [-∞,+∞] Range of possible values Do DF-search until first leaf Alpha-Beta Example (continued) [-∞,3] [-∞,+∞] Alpha-Beta Example (continued) [-∞,2] [3,+∞] [3,3] This node is worse for MAX Alpha-Beta Example (continued) [-∞,2] [3,14] [3,3] [-∞,14] Alpha-Beta Example (continued) [−∞,2] [3,5] [3,3] [-∞,5] Tic-Tac-Toe Example with Alpha-Beta Pruning x @ x x ® Start node MAX's move eo Backup Values ———> @ 1) a 0 /e} 6-5=1 5-5=0 6-5=1 5-5=04-5=-1 §-4=1 6-4=2 Ty 5-6=-1\15-5=05-6=-1 6-6=0 4-6=-2 Figure 4.17 _Two-ply minimax applied to the opening move of tic-tac-toe, Alpha-beta Algorithm • Depth first search – only considers nodes along a single path from root at any time  = highest-value choice found at any choice point of path for MAX (initially,  = −infinity)  = lowest-value choice found at any choice point of path for MIN (initially,  = +infinity) • Pass current values of  and  down to child nodes during search. • Update values of  and  during search: – MAX updates  at MAX nodes – MIN updates  at MIN nodes When to Prune • Prune whenever  ≥ . – Prune below a Max node whose alpha value becomes greater than or equal to the beta value of its ancestors. • Max nodes update alpha based on children’s returned values. – Prune below a Min node whose beta value becomes less than or equal to the alpha value of its ancestors. • Min nodes update beta based on children’s returned values. Alpha-Beta Example (continued) =−  =3 MIN updates , based on children. No change. =−  =+ Alpha-Beta Example (continued) MAX updates , based on children. =3  =+ 3 is returned as node value. Alpha-Beta Example (continued) =3  =+ =3  =+ , , passed to children Alpha-Beta Example (continued) 2 is returned as node value. MAX updates , based on children. No change. =3  =+ Alpha-Beta Example (continued) , =3  =+ =3  =+ , , passed to children Alpha-Beta Example (continued) , =3  =14 =3  =+ MIN updates , based on children. Alpha-Beta Example (continued) Max calculates the same node value, and makes the same move! 2 Alpha Beta Practical Implementation • Idea: – Do depth first search to generate partial game tree – Cutoff test : • Depth limit • Iterative deepening • Cutoff when no big changes (quiescent search) – When cutoff, apply static evaluation function to leaves – Compute bound on internal nodes – Run - pruning using estimated values – IMPORTANT : use node values of previous iteration to order children during next iteration Example 3 4 1 2 7 8 5 6 -which nodes can be pruned? Answer to Second Example (the exact mirror image of the first example) 6 5 8 7 2 1 3 4 -which nodes can be pruned? Min Max Max Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side). Effectiveness of Alpha-Beta Search • Worst-Case – Branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search • Best-Case – Each player’s best move is the left-most alternative (i.e., evaluated first) – In practice, performance is closer to best rather than worst-case • E.g., sort moves by the remembered move values found last time. • E.g., expand captures first, then threats, then forward moves, etc. • E.g., run Iterative Deepening search, sort by value last iteration. • Alpha/beta best case is O(b(d/2)) rather than O(bd) – This is the same as having a branching factor of sqrt(b), • (sqrt(b))d = b(d/2) (i.e., we have effectively gone from b to square root of b) – In chess go from b ~ 35 to b ~ 6 • permitting much deeper search in the same amount of time – In practice it is often b(2d/3) Final Comments about Alpha-Beta Pruning • Pruning does not affect final results!!! Alpha-beta pruning returns the MiniMax value!!! • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning • Repeated states are again possible. – Store them in memory = transposition table – Even in depth-first search we can store the result of an evaluation in a hash table of previously seen positions. Like the notion of “explored” list in graph-search Multiplayer Games • Multiplayer games often involve alliances: If A and B are in a weak position they can collaborate and act against C • If games are not zero-sum, collaboration can also occur in two-game plays: if (1000,1000_ Is a best payoff for both, then they will cooperate towards getting there and not towards minimax value. In real life there are many unpredictable external events A game tree in Backgammon must include chance nodes Schematic Game Tree for Backgammon Position • How do we evaluate good move? • By expected utility leading to expected minimax • Utility for MAX is the highest expected value of child nodes • Utility for MIN is the lowest expected value of child nodes • Chance node take the EXPECTED value of their child nodes. Pruning in nondeterministic game trees A version of a-(3 pruning is possible: Pruning in nondeterministic game trees A version of a-(3 pruning is possible: Chapter 5, Sections 1-5 Pruning in nondeterministic game trees A version of a-(3 pruning is possible: Chapter 5, Sections 1-3 Pruning in nondeterministic game trees A version of a-(3 pruning is possible: Chapter 5, Sections 1-3 Pruning in nondeterministic game trees A version of a-(3 pruning is possible: Chapter 5, Sections 1-335 • An alternative: Monte Carlo simulations: – Play thousands of games of the program against itself using random dice rolls. Record the percentage of wins from a position. AlphaGo • MCTS simulation • Policy/value estimation computed by (deep – 13 layers) neural network – Learned from 30 million human game samples • Policy/value estimation alone (without MCTS) plays on avg level • MCTS and policy/value eval fn equally important Summary • Game playing is best modeled as a search problem • Game trees represent alternate computer/opponent moves • Evaluation functions estimate the quality of a given board configuration for the Max player. • Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them • Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper • Human and computer (board) game playing moving in different separate directions : computer beat humans in most games and are getting better. Deterministic games in practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Chess: Deep Blue defeated human world champion Gary Kasparov in a six- game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: human champions refuse to compete against computers, who are too good. Go: human champions refuse to compete against computers, who are too bad. In go, 6 > 300, so most programs use pattern knowledge bases to suggest plausible moves. Chapter 5, Sectiona 1-5 5

Documents

questions

Introduction to Artificial Intelligence, Summaries of Material Engineering

Related documents

Partial preview of the text