Automatic Heuristic Construction in a Complete General Game Player

Similar documents
Automatic Heuristic Construction in a Complete General Game Player

Algorithms for Memory Hierarchies Lecture 14

MAXIMUM FLOWS IN FUZZY NETWORKS WITH FUNNEL-SHAPED NODES

Math Circles Finite Automata Question Sheet 3 (Solutions)

Make Your Math Super Powered

CHAPTER 2 LITERATURE STUDY

Example. Check that the Jacobian of the transformation to spherical coordinates is

The Discussion of this exercise covers the following points:

Spiral Tilings with C-curves

Simultaneous Adversarial Multi-Robot Learning

Outcome Matrix based Phrase Selection

Section 16.3 Double Integrals over General Regions

Understanding Basic Analog Ideal Op Amps

METHOD OF LOCATION USING SIGNALS OF UNKNOWN ORIGIN. Inventor: Brian L. Baskin

Domination and Independence on Square Chessboard

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

EE Controls Lab #2: Implementing State-Transition Logic on a PLC

Geometric quantities for polar curves

Module 9. DC Machines. Version 2 EE IIT, Kharagpur

Energy Harvesting Two-Way Channels With Decoding and Processing Costs

Student Book SERIES. Fractions. Name

Student Book SERIES. Patterns and Algebra. Name

First Round Solutions Grades 4, 5, and 6

A Development of Earthing-Resistance-Estimation Instrument

(CATALYST GROUP) B"sic Electric"l Engineering

PB-735 HD DP. Industrial Line. Automatic punch and bind machine for books and calendars

Exercise 1-1. The Sine Wave EXERCISE OBJECTIVE DISCUSSION OUTLINE. Relationship between a rotating phasor and a sine wave DISCUSSION

arxiv: v1 [cs.cc] 29 Mar 2012

Y9.ET1.3 Implementation of Secure Energy Management against Cyber/physical Attacks for FREEDM System

CS 135: Computer Architecture I. Boolean Algebra. Basic Logic Gates

Joanna Towler, Roading Engineer, Professional Services, NZTA National Office Dave Bates, Operations Manager, NZTA National Office

Application Note. Differential Amplifier

Crime Scene Documentation. Crime Scene Documentation. Taking the C.S. What should my notes include. Note Taking 9/26/2013

Redundancy Data Elimination Scheme Based on Stitching Technique in Image Senor Networks

Synchronous Generator Line Synchronization

Intention reconsideration in theory and practice

Synchronous Machine Parameter Measurement

Interference Cancellation Method without Feedback Amount for Three Users Interference Channel

Multi-beam antennas in a broadband wireless access system

Polar Coordinates. July 30, 2014

ECE 274 Digital Logic Fall 2009 Digital Design

ECE 274 Digital Logic. Digital Design. Datapath Components Shifters, Comparators, Counters, Multipliers Digital Design

9.4. ; 65. A family of curves has polar equations. ; 66. The astronomer Giovanni Cassini ( ) studied the family of curves with polar equations

CHAPTER 3 AMPLIFIER DESIGN TECHNIQUES

Lecture 20. Intro to line integrals. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.

Synchronous Machine Parameter Measurement

University of North Carolina-Charlotte Department of Electrical and Computer Engineering ECGR 4143/5195 Electrical Machinery Fall 2009

A Slot-Asynchronous MAC Protocol Design for Blind Rendezvous in Cognitive Radio Networks

Experiment 3: Non-Ideal Operational Amplifiers

FP2 POLAR COORDINATES: PAST QUESTIONS

Series. Teacher. Numbers

& Y Connected resistors, Light emitting diode.

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN fall 2008

10.4 AREAS AND LENGTHS IN POLAR COORDINATES

On the Description of Communications Between Software Components with UML

Experiment 3: Non-Ideal Operational Amplifiers

Outline. A.I. Applications. Searching for the solution. Chess game. Deep Blue vs. Kasparov 20/03/2017

Abacaba-Dabacaba! by Michael Naylor Western Washington University

Power-Aware FPGA Logic Synthesis Using Binary Decision Diagrams

CS2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2005

LECTURE 9: QUADRATIC RESIDUES AND THE LAW OF QUADRATIC RECIPROCITY

Kirchhoff s Rules. Kirchhoff s Laws. Kirchhoff s Rules. Kirchhoff s Laws. Practice. Understanding SPH4UW. Kirchhoff s Voltage Rule (KVR):

Engineer-to-Engineer Note

ITEC2620 Introduction to Data Structures

How to Build Wealth Like Warren Buffett.

Sequential Logic (2) Synchronous vs Asynchronous Sequential Circuit. Clock Signal. Synchronous Sequential Circuits. FSM Overview 9/10/12

SOLVING TRIANGLES USING THE SINE AND COSINE RULES

A Comparative Analysis of Algorithms for Determining the Peak Position of a Stripe to Sub-pixel Accuracy

Dataflow Language Model. DataFlow Models. Applications of Dataflow. Dataflow Languages. Kahn process networks. A Kahn Process (1)

Nevery electronic device, since all the semiconductor

Digital Design. Sequential Logic Design -- Controllers. Copyright 2007 Frank Vahid

Misty. Sudnow Dot Songs

Fuzzy Logic Controller for Three Phase PWM AC-DC Converter

DYE SOLUBILITY IN SUPERCRITICAL CARBON DIOXIDE FLUID

ECE 274 Digital Logic

Specifying Data-Flow Requirements for the Automated Composition of Web Services

Direct Current Circuits. Chapter Outline Electromotive Force 28.2 Resistors in Series and in Parallel 28.3 Kirchhoff s Rules 28.

B inary classification refers to the categorization of data

A Key Set Cipher for Wireless Sensor Networks

Genetic Representations for Evolutionary Minimization of Network Coding Resources

EET 438a Automatic Control Systems Technology Laboratory 5 Control of a Separately Excited DC Machine

Adaptive Network Coding for Wireless Access Networks

A Novel Back EMF Zero Crossing Detection of Brushless DC Motor Based on PWM

April 9, 2000 DIS chapter 10 CHAPTER 3 : INTEGRATED PROCESSOR-LEVEL ARCHITECTURES FOR REAL-TIME DIGITAL SIGNAL PROCESSING

Topic 20: Huffman Coding

Foot-Pedal: Haptic Feedback Human Interface Bridging Sensational Gap between Remote Places

Network Sharing and its Energy Benefits: a Study of European Mobile Network Operators

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Improving synchronized transfers in public transit networks using real-time tactics

Digital Design. Chapter 1: Introduction

Section 17.2: Line Integrals. 1 Objectives. 2 Assignments. 3 Maple Commands. 1. Compute line integrals in IR 2 and IR Read Section 17.

MEASURE THE CHARACTERISTIC CURVES RELEVANT TO AN NPN TRANSISTOR

Development and application of a patent-based design around. process

Spotted at APA. Top Points this week. Spring f ro. Year 7. Year 8. Year 9. Year 10. Year 11. Student Newsletter ~

Solutions to exercise 1 in ETS052 Computer Communication

CAL. NX15 DUO-DISPLAY QUARTZ

CSI-SF: Estimating Wireless Channel State Using CSI Sampling & Fusion

ABB STOTZ-KONTAKT. ABB i-bus EIB Current Module SM/S Intelligent Installation Systems. User Manual SM/S In = 16 A AC Un = 230 V AC

Engineer-to-Engineer Note

Mixed CMOS PTL Adders

Transcription:

Automtic Heuristic Construction in Complete Generl Gme Plyer Gregory Kuhlmnn, Kurt Dresner nd Peter Stone Deprtment of Computer Sciences, The University of Texs t Austin 1 University Sttion C0500, Austin, Texs 78712-1188 {kuhlmnn,kdresner,pstone}@cs.utexs.edu Abstrct Computer gme plyers re typiclly designed to ply single gme: tody s best chess-plying progrms cnnot ply checkers, or even tic-tc-toe. Generl Gme Plying is the problem of designing n gent cpble of plying mny different previously unseen gmes. The first AAAI Generl Gme Plying Competition ws held t AAAI 2005 in order to promote reserch in this re. In this rticle, we survey some of the issues involved in creting generl gme plying system nd introduce our entry to tht event. The min feture of our pproch is novel method for utomticlly constructing effective serch heuristics bsed on the forml gme description. Our gent is fully implemented nd tested in rnge of different gmes. Introduction Creting progrms tht cn ply gmes such s chess, checkers, nd bckgmmon, t high level hs long been chllenge nd benchmrk for AI. While severl gme-plying systems developed in the pst, such s Deep Blue (Cmpbell, Jr., & Hsu 2002), Chinook (Scheffer et l. 1992), nd TD-gmmon (Tesuro 1994) hve demonstrted competitive ply ginst humn plyers, such systems re limited in tht they ply only one prticulr gme nd they typiclly must be supplied with gme-specific knowledge. While their performnce is impressive, it is difficult to to determine if their success is due to the prticulr gme-plying technique or due to the humn gme nlysis. A generl gme plying gent must be ble to tke s input description of gme s rules nd proceed to ply without ny humn input. Doing so requires the integrtion of severl AI components, including theorem proving, feture discovery, heuristic serch, nd potentilly lerning. This pper presents complete nd fully utonomous generl gme plying gent designed to prticipte in the first AAAI Generl Gme Plying (GGP) Competition which ws held t the AAAI 2005 in Pittsburgh (Genesereth & Love 2005). The min contribution is novel method for utomticlly constructing effective serch heuristics bsed on the forml gme description. Our gent is fully implemented nd tested on severl different gmes. Generl Gme Plying The Generl Gme Plying problem is the chllenge of creting system cpble of plying gmes with which it hs Supported in prt by NSF CAREER wrd IIS-0237699. Copyright c 2006, Americn Assocition for Artificil Intelligence (www.i.org). All rights reserved. hd no prior experience. The definition of gme cn be quite brod, rnging from single-stte mtrix gmes such s the Prisoner s Dilemm to complex, dynmic tsks like robot soccer. A generl gme plying scenrio is specified by three min components: (1) the clss of gmes to be considered, (2) the domin knowledge prior to the strt of the gme, nd (3) the performnce mesure. First, in this pper, we restrict our ttention to the clss of gmes tht were considered in the 2005 AAAI GGP competition, nmely discrete stte, deterministic, perfect informtion gmes. The gmes my be single or multi-plyer nd they my be turn-tking or simultneous decision. By deterministic, we men tht given stte of the gme, nd joint set of ctions tken by ll plyers, the next stte is uniquely determined. Go nd Othello re exmples of gmes from this clss. However, Bckgmmon is not, becuse dice rolls re nondeterministic. In perfect informtion gme, the complete stte of the gme is known by ll prticipnts. Chess nd Checkers qulify s perfect informtion gmes, becuse the stte of the gme is completely cptured by the positions of the pieces on the bord, which is in plin sight. In contrst, gmes such s Poker nd Bttleship do not qulify s perfect informtion gmes becuse plyers hide prt of the gme stte from their opponents. Second, in ddition to the set of gmes to be considered, generl gme plying scenrio is prmeterized by the mount nd type of domin knowledge given to the plyers prior to the strt of the gme. For exmple, in the Lerning Mchine Chllenge sponsored by Ai in Jnury of 2002, the plyers were told nothing more thn the set of legl moves. In the scenrio dopted in this work, plyers re given forml description of the rules of the gme. For our purposes, the gme description, t minimum, must contin sufficient informtion to llow the gent to simulte the gme on its own. Our pproch cn tke dvntge of certin kinds of structure in the gme description, s we will demonstrte. Third, we must specify how gent performnce is to be mesured. In our setup, n gent is evluted bsed on the number of points erned in single shot gme ginst competing gents. The identities of the opponents re hidden. The key question tht our work seeks to ddress is: In the generl gme plying scenrio described bove, how cn we leverge the forml gme description to improve gent performnce? Before we nswer this question, we will describe one concrete instnce of this scenrio, nmely the AAAI GGP competition. The AAAI GGP scenrio ws the min motivting scenrio for our gent development nd is the scenrio in which ll of our empiricl results re reported. 1457

The AAAI GGP Competition The first nnul AAAI Generl Gme Plying Competition ws held t the 2005 AAAI conference in Pittsburgh (Genesereth & Love 2005). Nine tems prticipted in tht event. In ech of three min rounds, gme plying gents were divided into groups to compete in both two- nd three-plyer gmes. The gmes included three plyer version of Chinese Checkers, vrint of Othello clled Nothello, nd in the finl round, simultneous decision rcing gme clled Rcetrck Corridor. The complete detils of the competition re vilble online 1. In the competition setup, ech plyer runs s n independent server process. At the strt of gme, process clled the Gme Mnger connects to ech plyer nd sends the gme description long with time limits clled the strt clock nd ply clock. Plyers hve the durtion of the strt clock to nlyze the gme description before the gme begins. Ech turn, plyers get the durtion of the ply clock to choose nd nnounce their moves. After ech turn, the Gme Mnger informs the plyers of the moves mde by ech plyer. The gme continues until terminl stte is reched. No humn intervention is permitted t ny point: the generl gme plyer must be complete nd fully utonomous gent. Gme Description Lnguge The Gme Description Lnguge (GDL), used in the competition, is first-order logicl description lnguge bsed on KIF (Genesereth 1991). In GDL, gmes re modeled s stte mchines in which stte is the set of true fcts t given time. Using theorem prover, n gent cn derive its legl moves, the next stte given the moves of ll plyers, nd whether or not it hs won. Ech of these opertions requires theorem proving; simulting gmes cn be costly. One of the reserch chllenges of GGP is to find efficient methods for resoning bout gmes described in firstorder lnguges. Prt of the description for gme clled Minichess is shown in Figure 1. A GGP gent must be ble to ply ny gme, given such description. We illustrte the fetures of GDL through this exmple. First, GDL declres the gme s roles (line 1). The Gme Mnger ssigns ech gme plying gent role to ply. Minichess hs two roles, white nd blck, mking it two plyer gme. Next, the initil stte of the gme is defined (2 7). Ech functionl term inside n init reltion is true in the initil stte. Besides init, none of the tokens in these lines re GDL keywords. The predictes cell, control nd step re ll gme-specific. Even the numbers do not hve ny externl mening. If ny of these tokens were to be replced by different token throughout the description, the mening would not chnge. This lck of commitment to ny lexicl semntics will be importnt when we discuss our pproch to feture discovery. GDL lso defines the set of legl moves vilble to ech role through legl rules (8 15). The <= symbol is the reverse impliction opertor. Tokens beginning with question mrk re vribles. The true reltion is ffirmtive if its rgument cn be stisfied in the current stte. Ech plyer 1 http://gmes.stnford.edu/results.html 1. (role white) (role blck) 2. (init (cell 1 b)) (init (cell 2 b)) 3. (init (cell 3 b)) (init (cell 4 bk)) 4. (init (cell d 1 wr)) (init (cell d 2 b)) 5. (init (cell d 3 b)) (init (cell d 4 b)) 6. (init (control white)) 7. (init (step 1)) 8. (<= (legl white (move wk?u?v?x?y)) 9. (true (control white)) 10. (true (cell?u?v wk)) 11. (kingmove?u?v?x?y) 12. (true (cell?x?y b)) 13. (not (restricted?x?y))) 14. (<= (legl white noop) 15. (true (control blck))) 16. (<= (next (cell?x?y?p)) 17. (does?plyer (move?p?u?v?x?y))) 18. (<= (next (step?y)) 19. (true (step?x)) 20. (succ?x?y)) 21. (succ 1 2) (succ 2 3) (succ 9 10) 22. (nextcol b) (nextcol b c) (nextcol c d) 23. (<= (gol white 100) 24. checkmte) 25. (<= (gol blck 100) 26. (not checkmte)) 27. (<= terminl 28. (true (step 10))) 29. (<= terminl 30. stuck) Figure 1: Prtil gme description for Minichess, GDL keywords shown in bold. must hve t lest one legl move in ech nonterminl stte for the gme to be vlid. The second rule (14 15) demonstrtes how turn-tking is simulted in GDL by requiring plyer to execute null ction in certin sttes. The stte trnsition function is defined using the next keyword (16 20). Trnsition rules re used to find the next stte, given the current stte nd the ctions of ll plyers. The does predicte is true if the given plyer selected the given ction in the current stte. Finlly, GDL defines rules to determine when the gme stte is terminl (27 30). When the gme ends, ech plyer receives the rewrd defined by the gme s gol rules (23 26). A gme description my define dditionl reltions to simplify the conditions of other rules nd support numericl reltionships. For instnce, the succ reltion (21) defines how the gme s step counter is incremented, nd the nextcol reltion (22) orders the columns of the chess bord. As we will discuss, identifying these kinds of numericl reltionships is extremely vluble, s they serve s bridge between logicl nd numericl representtions. 1458

Approch In our gme plying scenrio, in which n gent my look hed by simulting moves, n obvious choice of pproch is serch. Most existing gme plying systems for the types of gme tht we consider re bsed upon the Minimx serch lgorithm. Well-known exmples include Chinook (Scheffer et l. 1992) for Checkers nd Deep Blue (Cmpbell, Jr., & Hsu 2002) for Chess. Even lerning-bsed systems such s TD-gmmon (Tesuro 1994) incorporte serch for ction selection. However, unless the stte spce is smll enough to be serched exhustively, the gent must use heuristic evlution function to evlute non-terminl nodes nd thus bound serch depth. Such evlution functions re highly gme-specific, nd much of the humn effort in developing gme plying systems is spent on mnully tuning them. In generl gme plying setting, the evlution function cnnot be provided by humn. Becuse ech lookhed requires costly theorem proving, exhustive serch is not vible option. To mke resoning s efficient s we could, our gent uses Prolog-style interpreter for theorem proving. In erly testing, we found tht this ws significntly fster thn other kinds of resolution. Even so, ll but the smllest gme trees re beyond the rech of exhustive serch. To overcome these issues, we developed method for generting heuristic evlution functions utomticlly. We construct heuristics from fetures identified in the gme description. Cndidte heuristics re evluted in prllel during ction selection to find the best move. We discuss the detils of our serch lgorithm in the next section before moving on to the heuristic construction lgorithm. Serch Algorithm The serch lgorithm employed by our plyer is bsed on lph-bet pruning (Knuth & Moore 1975). Without heuristic, the terminl nodes of the gme tree re visited in depth-first serch order. For heuristic serch, we use itertive deepening (Korf 1985) to bound the serch depth. We include two generl-purpose enhncements: trnsposition tbles, which cche the vlue of sttes encountered previously in the serch; nd the history heuristic, which reorders children bsed on their vlues found in lower-depth serches. Other generl techniques exist, but the combintion of these two enhncements ccounts for most of the serch spce cutoff when combined with dditionl techniques (Scheffer 1989). We extended this bsic minimx serch lgorithm to llow simultneous decision gmes nd gmes with more thn two plyers. Although our implementtion ws designed to work for the brodest possible set of gmes, we mde one importnt simplifying ssumption. We ssume tht ech plyer in the gme cn be plced into one of two sets: temmtes nd opponents. Thus, every plyer is either with us or ginst us. Our simple but strict definition of temmte is plyer tht lwys receives the sme rewrd tht we do. We determine which tem plyer is on through internl simultion (see below). By treting somewht coopertive plyers s opponents, we void the need to mintin seprte utility function for ech tem, which cn be expensive since stndrd lph-bet pruning cn no longer be used. Our plyer uses the sme serch procedure for turn-tking nd simultneous decision gmes. The procedure lso pplies to gmes with mix of the two. At turn-tking node, if the move is being mde by temmte, it is treted s mximiztion node. If it is n opponent s turn to move, it is minimiztion node. This type of serch is n instnce of the prnoid lgorithm (Sturtevnt & Korf 2000). If it is simultneous move node, becuse of the unpredictbility of our opponents, we ssume tht our opponents choose their moves uniformly t rndom. We choose our ction from the joint ction of our tem tht mximizes our expected rewrd, bsed on tht ssumption. If we hve prior knowledge of the opponent, either from pst gmes or erlier moves in the current gme, opponent modeling methods could be pplied t this point, s could gme theoretic resoning. Identifying Syntctic Structures The first step towrd constructing heuristic evlution function is to identify useful structures in the gme description. Useful structures include time counters, gme bords, movble pieces, commodities, etc. Our gent identifies these structures by syntcticlly mtching gme description cluses to set of templte ptterns. Although the syntx of these templtes re specific to GDL, they re conceptully generl, nd the templte mtching procedure could be pplied to other lnguges. Still, we include the idiosyncrsies of GDL in our discussion both to mke the procedure concrete nd to id future competition prticipnts. Specificlly, we describe our gent s identifiction of five structurl gme elements: successor reltions, counters, bords, mrkers, pieces, nd quntities. As mentioned bove, other thn keywords, GDL tokens hve no pre-defined mening to the plyer. In fct, during the competition, the tokens were scrmbled to prevent the use of lexicl clues. Therefore, ll structures re detected by their syntctic structure lone. For exmple, one of the most bsic structures our gent looks for is successor reltion. This type of reltion induces totl ordering over set of objects. We hve lredy seen two exmples in lines 21 22 of Figure 1. In competition setting, the sme successor reltions my be look something like the following: (tcsh pico cpio) (tcsh cpio grep) (tcsh echo ping) (quiet q w) (quiet w e) (quiet e r) (quiet r t) Our system cn still identify these reltions s successors becuse order is structurl, rther thn lexicl, property. When successor reltion is detected, the objects in its domin cn be compred to one nother nd, in some cses, incremented, decremented, dded nd subtrcted. Identifying such structures is very importnt. If, for exmple, the gent could not identify them in Minichess, it would hve no wy to know tht step 8 comes fter step 7 or tht column is next to column b, both of which re useful. A gme description my include severl successor reltions, which the gent uses to identify dditionl structures such s counter. A counter is functionl term in the gme s stte tht increments ech time step. Our gent identifies it by looking for rule tht mtches the templte: (<= (next (<counter>?<vr1>)) (true (<counter>?<vr2>)) (<successor>?<vr2>?<vr1>)) 1459

where <counter> is the identified function, nd <successor> is some successor reltion. The order of the ntecedents is not importnt. The step function in Minichess is n exmple of time step counter, s cn be seen in lines 18 20 of Figure 1. Our gme plyer identifies counters for severl resons. First, if the counter termintes the gme in fixed number of steps, it my be removed during internl simultion to increse the chnces of encountering gol stte. More importntly, in some gmes, leving the counter in the stte representtion cuses stte lising, mking serch inefficient. Therefore, the function is removed before cching stte s vlue in the trnsposition tble. Another exmple of structure tht our plyer ttempts to identify is bord. A bord is two dimensionl grid of cells tht chnge stte, such s chess or checkers bord. When the gent receives the gme description, it ssumes every ternry function in the stte is bord. However, one of the bord s rguments must correspond to the cell s stte, which cn never hve two different vlues simultneously two of its rguments re inputs nd the third is n output. We verify this property using internl simultion. According to our definition of bord, cndidte is rejected if it ever hs two objects in the sme plce. Although such bord my be vlid, we choose to eliminte it to prevent flse positives. Minichess hs one bord: the cell function. If bord s input rguments re ordered by some successor reltion, then we sy tht the bord is ordered. Once bord is identified, our plyer looks for dditionl relted structures such s mrkers, which re objects tht occupy the cells of the bord. If mrker is in only one cell t time, we distinguish it s piece. For exmple, in Minichess, the white rook, wr, nd the blck king bk re pieces. However, gmes like Othello where, for instnce, blck object my be in multiple plces, hve only mrkers. Our plyer lso identifies dditionl structures tht do not involve bords or pieces. These structures re functions in the stte tht my quntify n mount of something, such s money in Monopoly. These re identified s those reltions hving ordered output rguments. We discuss the distinction between input nd output rguments in the next section. Internl Simultion We hve mentioned severl situtions in which we needed to prove n invrint bout sttes of the gme. For instnce, we need to prove n invrint bout terminl sttes to divide the gme s roles into tems. We lso need to prove invrints bout the input nd output modes of reltions to identify bords, mrkers nd pieces. Rther thn proving these invrints formlly, which would be time-consuming, our gent uses n pproximte method to become resonbly certin tht they hold. The gent uses its theorem prover to simulte gme of rndom plyers, during which, the gent checks whether or not its currently hypothesized invrints hold. To demonstrte, we trce the process of identifying the bord nd pieces in the Minichess exmple. The gent ssumes tht cell is bord becuse it is the only ternry function in the stte. From just checking the initil stte, the gent is ble to determine tht the only cndidte mode is (cell + + -), where + signifies n input rgument nd - signifies n output rgument. This turns out to be the right nswer nd no further refinement is necessry. Our gent lso initilly ssumes tht every object ppering in the output rgument of the function is piece. Therefore, in the exmple, wr, bk nd b re ll initilly pieces. From checking the initil stte, the gent elimintes b s piece becuse it is in more thn one plce t time. Further simultion of the gme never rejects wr or bk s pieces. As for the tems, our plyer ssumes tht blck nd white re on the sme tem. Once the first terminl stte is reched during simultion, though, this hypothesis is rejected. Minichess is zero sum gme without ny conditions for tie, nd therefore would never ssign equl rewrd to the two roles. After the first simulted gme, blck nd white re seprted into two tems. Minichess is n esy cse in tht the correct hypotheses re reched very quickly. Other gmes my require more simultion. In the gmes tht we hve encountered thus fr, however, tems were identified fter t most 2 gmes nd bords nd pieces stbilized fter roughly 3 time steps. In the competition, our gent rn the internl simultion for the first 10% of the strt clock, with minimum of 50 sttes visited. For typicl gme, this mount of simultion ws more thn sufficient to estblish high degree of confidence in the gent s hypotheses. From Syntctic Structures to Fetures The structures identified in the previously described processes suggest severl interesting fetures. We use feture to men numericl vlue, clculted from the gme stte, tht hs potentil to be correlted with the outcome of the gme. If the dimensions of the bord re ordered, then our gent computes the x nd y coordintes of ech piece s the nturl numbers ssocited with the input rguments indices in their corresponding successor reltions. For exmple, the coordintes of wr in the initil stte of the Minichess exmple re (3, 0). From these coordintes, the gent cn lso clculte the Mnhttn distnces between pieces. If bord is not ordered, it my still produce useful fetures, including the number of occurrences of ech mrker. In ddition, the gent genertes fetures corresponding to the vlues of ech quntifible mount. A complete list of fetures generted by the plyer is shown in Tble 1. Identified Structure Ordered Bord w/ Pieces Bord w/o Pieces Quntity Generted Fetures Ech piece s X coordinte Ech piece s Y coordinte Mnhttn distnce between ech pir of pieces Sum of pir-wise Mnhttn distnces Number of mrkers of ech type Amount Tble 1: Generted fetures for identified structures. From Fetures to Heuristics From the set of generted fetures, our gme plyer cretes heuristics to guide serch. In trditionl single-gmeplying systems, multiple fetures re mnully weighted 1460

nd combined to crete single evlution function. Becuse the gmes to be plyed re not known to the designer, tht option is not vilble to gents in the generl gme plying scenrio. While it my be possible to lern the evlution function, s is done by TD-gmmon, doing this efficiently in the generl cse remins n open problem. Insted of constructing single heuristic function, our gent constructs set of cndidte heuristics, ech being the mximiztion or minimiztion of single feture. By including both, the gent cn hndle gmes with counterintuitive gol conditions. As it turned out, Nothello gve us gret exmple of such gme during the competition. In Nothello, the plyer with the fewest mrkers t the end of the gme wins. Becuse the gent genertes heuristic to minimize the number of its own mrkers, it hd cndidte heuristic tht mtched well with the gme s gol. We implement the cndidte heuristics s bord evlution functions tht linerly mp the feture s vlue to n expected rewrd between R + 1 nd R + 1, where R nd R + re the minimum nd mximum gol vlues chievble by the plyer, s described by the gol predicte. The vlues of the mximizing nd minimizing heuristic functions re clculted respectively s follows: H(s) = 1 + R + (R + R 2) V (s) H(s) = 1 + R + (R + R 2) [1 V (s)] where H(s) is the vlue of heuristic H in stte s nd V (s) is the vlue of the feture, scled from 0 to 1. We scle the heuristic function in this wy so tht definite loss is lwys worse thn ny pproximted vlue, nd likewise, definite win is lwys better thn n unknown outcome. Distributed Serch Not ll of the heuristics tht the plyer genertes from the gme description re prticulrly good for tht gme. Nor is the best choice of heuristic necessrily the sme throughout the course of the gme. Rther thn ttempting to extrct hints bout the gol from the gol conditions in the gme description, which could be rbitrrily complex, we evlute the cndidte heuristics online using distributed serch. In ddition to the min gme plyer (GP) process, our plyer lunches severl remote slve (Slv) processes. In ech ply clock, the min process informs ech slve of the current stte, s, nd ssigns ech one heuristic, h, to use for serch. Occsionlly, the slve processes respond with their best ction so fr, i. Before the ply clock expires, the gme plyer evlutes the suggested ctions, chooses the best one,, nd sends it to the gme mnger (GM). This procedure, with one slve, is illustrted in Figure 2. The sme heuristic my be ssigned to multiple slves if the number of heuristics is smller thn the number of slves. This ws typiclly the cse during the competition, with roughly two dozen heuristics nd lmost 200 slves. Although this redundncy typiclly leds to significnt duplicted effort, ties between ction scores re broken rndomly during serch nd thus the different processes end up exploring different prts of the serch spce. Also, severl slves perform exhustive serch using no heuristic. This ensures optiml ply ner the end of the gme. GM GP Slv A h,s Ply Clock Time 1 2 3 Figure 2: Messges between the Gme Mnger, Gme Plyer, nd Slve process during single ply clock. Choosing from mong the suggested ctions is significnt chllenge, but one component of the strtegy is cler: if exhustive serch returns n ction, the gent should lwys choose it the ction is gurnteed to be optiml. Beyond tht, our gent prefers ctions from deeper serches to those from shllower ones, s deeper serches llow the gent to better evlute the long term effects of its ctions. It is tricky, however, to choose between ctions nominted by different heuristics. Doing so in principled wy, perhps by tking into ccount the gme s gol condition, remins n open chllenge nd is n importnt direction for future work. Experimentl Results After successful strt enbling our gent to rech the finls (3 tems out of the originl 9), networking glitch blocking connectivity between the competition site nd our servers t home prevented our full prticiption in the finls. In this section, we present controlled experiments isolting the min contributions of this pper, nmely the feture discovery nd utomtic heuristic construction methods. Without the feture discovery nd utomtic heuristic construction techniques described bove, gme plyer could only resort to performing exhustive serch for the entire gme. While this strtegy results in optiml ply ner the end of the gme, the consequences of mistkes mde erly on would likely be irreprble. To mesure the impct of our contributions on the plyer s performnce, we conducted experiments in which plyer with generted heuristic competes with plyers performing exhustive serch. To isolte the feture discovery nd heuristic construction processes from the heuristic selection method, we do not use distributed serch to evlute heuristics. Insted, we choose single heuristic for the plyer to use. Doing so emultes wht we would expect from n gent with our heuristic construction method, but with better heuristic evlution method. In ll cses, we choose the heuristic from preexisting options bsed on our knowledge of the gme, but without revision or experimenttion fter the initil choice. The exhustive nd heuristic plyers were pitted ginst ech other in 3 different gmes. The first gme, Nothello, which we introduced before, is vrint of Othello in which ech of the two plyers tries to end the gme with fewer mrkers thn their opponent. A win erns plyer 100 points nd drw is worth 50. The generted heuristic tht we chose ws the one tht minimized the number of mrkers for our plyer. We hypothesized tht minimizing plyer s mrkers in the short term could be win in the long run. The second gme, Hllwy, is two plyer gme plyed on chess bord. Ech plyer controls pwn tht, strting on opposite sides, must rech the other side before the op- * 1461

ponent to ern 100 points. During the gme, plyer my plce up to four wlls to hinder their opponent s progress. If neither plyer reches the other side in 200 time steps, then the gme is drw nd ech plyer receives 50 points. The heuristic we chose for Hllwy mximized the x- coordinte of our plyer s piece. This heuristic encourges the plyer to move closer to the opposite side of the bord. Lstly, Frmers is three plyer commodities trding gme. The three gents ech strt with equl mounts of money, nd my use it to buy nd sell cotton, cloth, whet nd flour. By sving up for frm or fctory, plyer my produce dditionl commodities. Ech time step, ll plyers mke their trding decisions simultneously. When the gme ends ten steps lter, the plyer with the most money wins. If there is tie, ll plyers with the most money receive 100 points. The heuristic tht ws chosen for this gme ws the one tht mximizes the plyer s own money. Both of the heuristic plyer s opponents used exhustive serch. All three gmes were developed by the orgnizers of the AAAI GGP competition nd were not creted to mtch the constructed heuristics. For ech gme, we rn severl mtches with strt nd ply clocks of 60 seconds. We recorded the number of gmes in which the gent using the generted heuristic scored 100 points. The results re shown in Tble 2. In Nothello, our gent won ll 15 of its mtches. The probbility, p, tht rndom plyer would perform s well s our gent is roughly 10 5. We clculted this bsed on the prior probbility of winning using rndom moves, which ws not quite 50% due to smll chnce of ties. We found this probbility experimentlly by running 500 mtches with ll plyers choosing rndom moves. The p vlues for the remining gmes were found similrly. Gme Wins Mtches p Nothello 15 15 10 5 Hllwy 15 15 10 11 Frmers 11 25 0.234 Tble 2: Results for the gent using generted heuristic versus one or more plyers using exhustive serch. In the Hllwy gme, our gent gin won ll 15 of its mtches. In this cse, the results re even more significnt becuse the prior probbility of winning by chnce is only bout 20%. Roughly 60% of rndom gmes end in tie. Finlly, in Frmers, our gent performed better thn the prior expecttion of 35.15%, but did not win enough gmes for the results to be sttisticlly significnt. The gent needed to win 14 of its 25 gmes for us to be t lest 95% confident tht its success ws not by chnce. The utomticlly generted heuristics were quite successful on the first two gmes, nd on the third gme, did not hurt performnce. Relted Work Prior reserch on the feture discovery problem includes Fwcett s (1993) thesis work. Although Fwcett does not construct of completely generl gme plyer, the feture discovery lgorithm is pplied in gme-plying setting. Fetures for the gme of Othello re generted from gme description with syntx somewht like GDL. Fetures re discovered by pplying trnsformtion opertors on existing fetures, beginning with the gol condition itself, in kind of serch through feture spce. It is uncler how dependent the method s success is on the STRIPS-style domin theory, but it my be possible to pply the sme technique in GGP. The most relevnt work to generl gme plying is Brney Pell s Metgmer (Pell 1993). This work ddresses the spce of chess-like gmes, brod enough to include Checkers, Chinese Chess, Go nd mny more vrints without common nmes. Agin, becuse the domin representtion ws constructed s prt of the work, it is not obvious tht the techniques could directly pply in the GGP setting. This work lso ddresses the interesting problem of utomticlly generting novel gmes from distribution of possibilities Finlly, there is n interesting, commercilly vilble generl gme plying system clled Zillions of Gmes 2. Gmes re described using higher-level lnguge thn GDL. The system comes with severl gmes nd n gent opponent to ply ginst. As fr s we re wre, though, this gent is not ble to perform feture discovery or heuristic construction in completely generl wy on its own. Conclusion We hve introduced lgorithms for discovering fetures in forml gme description nd generting heuristic evlution functions from them. These methods re integrted with theorem proving nd heuristic serch lgorithms into complete gent for generl gme plying. By building on these techniques, we cn continue to mke progress towrd the ongoing generl gme plying chllenge. References Cmpbell, M.; Jr., A. J. H.; nd Hsu, F. H. 2002. Deep blue. Artificil Intelligence 134(1 2):57 83. Fwcett, T. E. 1993. Feture discovery for problem solving systems, PhD thesis, University of Msschusetts, Amherst. Genesereth, M., nd Love, N. 2005. Generl gme plying: Overview of the AAAI competition. AI Mgzine 26(2). Genesereth, M. 1991. Knowledge interchnge formt. In Principles of Knowledge Representtion nd Resoning: Proceedings of the Second Intl. Conference (KR 91). Knuth, D. E., nd Moore, R. W. 1975. An nlysis of lph-bet pruning. Artificil Intelligence 6(4):293 326. Korf, R. E. 1985. Itertive deepening: An optiml dmissble tree serch. Artificil Intelligence 27:97 109. Pell, B. 1993. Strtegy genertion nd evlution for met-gme plying. PhD thesis, University of Cmbridge. Scheffer, J.; Culberson, J. C.; Trelor, N.; Knight, B.; Lu, P.; nd Szfron, D. 1992. A world chmpionship cliber checkers progrm. Artificil Intelligence 53(2-3):273 289. Scheffer, J. 1989. The history heuristic nd lph-bet serch enhncements in prctice. IEEE Trnsctions on Pttern Anlysis nd Mchine Intelligence 11:1203 1212. Sturtevnt, N. R., nd Korf, R. E. 2000. On pruning techniques for multi-plyer gmes. In Procs. AAAI-00, 201 207. Tesuro, G. 1994. Td-gmmon, self-teching bckgmmon progrm, chieves msterlevel ply. Neurl Computtion 6:215 219. 2 http://www.zillions-of-gmes.com/ 1462