Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017

Size: px

Start display at page:

Download "Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER April 6, 2017"

Anastasia Bond
5 years ago
Views:

1 Prof. Sameer Singh CS 175: PROJECTS IN AI (IN MINECRAFT) WINTER 2017 April 6, 2017

2 Upcoming Misc. Check out course webpage and schedule Check out Canvas, especially for deadlines Do the survey by tomorrow, April 7, 2017 Homework Homework 1 will be up soon Meanwhile, install and get Malmo working Due: April 14, 2017 Project Teams are due April 17, 2017, Proposals April 21, 2017 Start assembling teams now! (use Piazza) Start thinking of project ideas CS 175: PROJECTS IN AI (SPRING 2017) 2

3 Projects in AI in Minecraft Project Overview Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 3

4 Projects in AI in Minecraft Project Overview Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 4

5 What is AI? "Artificial intelligence is anything computers can't do yet." - Douglas Hofstadter CS 175: PROJECTS IN AI (SPRING 2017) 5

6 What can a project be? Research Do difficult things automatically, Minecraft is just a testbed Help players do things that are otherwise time-consuming Practical Tool Art Just cool! Use AI/ML to create stuff in the world CS 175: PROJECTS IN AI (SPRING 2017) 6

7 Technical Solution Use Artificial Intelligence or Machine Learning algorithms Artificial Intelligence Machine Learning Heuristic/Adversarial/Local Search Supervised Learning Logic Planning Bayesian Networks Unsupervised Learning Reinforcement Learning Natural Language Processing Computer Vision Recommendation Systems Computer Vision Constraint Satisfaction Time Series Modeling Deep Learning CS 175: PROJECTS IN AI (SPRING 2017) 7

8 Evaluation How would YOU define that your project was a success? Quantitative Evaluation Numerical Metrics: Accuracy, F1, AUC, Time to run, time to train Baselines: What would be currently used? What are reasonable simpler methods? By how much amount? We hope to improve the METRIC by AMOUNT over BASELINE! (I won t hold you to it, just want you to think about it) CS 175: PROJECTS IN AI (SPRING 2017) 8

9 Evaluation How would YOU define that your project was a success? Qualitative Evaluation Simple Example Cases: What are examples that your idea will definitely work on? What is the expected output on these? Error Analysis and Introspection: Are there plots/figures to verify the behavior? If it doesn t work, how will you improve it? The Super-Impressive Example What is the best example? awesome if it works E.g. something that perfectly captures your idea! CS 175: PROJECTS IN AI (SPRING 2017) 9

10 You will have doubts! Is it too simple? Is there data to train my classifier? Is it too ambitious? Is there a different algorithm I should use? Is my evaluation inappropriate? Can I only use off-the-shelf code? Every team has to meet me during Week 4. Use Piazza! Discussion will cover many simple situations Both TA and me are available for appointments CS 175: PROJECTS IN AI (SPRING 2017) 10

11 Projects in AI in Minecraft Project Overview Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 11

12 Projects in AI in Minecraft Course Information Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 12

Learn Recipes Figure out best way to make items Without any knowledge of the recipes Combat Learn to hide/find

13 Reinforcement Learning Agent learns to do things by trying things, and succeeding/failing Navigation Explore the map without dying Solve mazes Learn the best way home from anywhere Get to the highest hill in the map Learn Recipes Figure out best way to make items Without any knowledge of the recipes Combat Learn to hide/find shelter Learn to fight, example paper CS 175: PROJECTS IN AI (SPRING 2017) 13

Action What the agent can do Reward What the agent likes/dislikes

14 Reinforcement Learning Agent learns to do things by trying things, and succeeding/failing Observation What the agent sees Action What the agent can do Reward What the agent likes/dislikes New Item++ No Item- Goal++ Died--- CS 175: PROJECTS IN AI (SPRING 2017) 14

15 Reinforcement Learning Next few lectures will go into details (and more ideas) For now, let s look at non-rl ideas CS 175: PROJECTS IN AI (SPRING 2017) 15

16 Describe the Scene Houses and a pig on a grassy field during the day. Pig staring at me in a village. CS 175: PROJECTS IN AI (SPRING 2017) 16

17 Live Commentator Hit a rabbit CS 175: PROJECTS IN AI (SPRING 2017) 17

18 How is this even possible? 3 block in a line Grass blocks as floor Daylight, clear weather Malmo Training Signal 3 block in a line Deep Learning, CNN + LSTM Machine Learning CS 175: PROJECTS IN AI (SPRING 2017) 18

19 Many Variations of These Label x1000 Your code Agent/World in Malmo x Render x Label Machine Learning object objects action depth of pixel object detection ~caption generation ~action detection, commentary ~stereoscopy, depth/distance prediction CS 175: PROJECTS IN AI (SPRING 2017) 19

20 Captions to Speech Why are you making me read? Pig staring at me in a village. CS 175: PROJECTS IN AI (SPRING 2017) 20

21 Natural Language Navigation Quite Difficult! > Go forward till you hit a wall > Go to the pig > Go to the house on the right > Go behind the house trivial hardest CS 175: PROJECTS IN AI (SPRING 2017) 21

22 Natural Language Interface Quite Difficult! > Choose steel pickaxe and dig > Go and destroy that window > Put the blue block on the closest wall > Find a tree and chop it trivial hardest CS 175: PROJECTS IN AI (SPRING 2017) 22

23 SHRDLU (from 1970!) CS 175: PROJECTS IN AI (SPRING 2017) 23

24 Natural Speech to Commands Why are you making me type? Off the shelf Speech to Text systems Online Speech to Text APIs CS 175: PROJECTS IN AI (SPRING 2017) 24

25 Photo to Minecraft Character Photo of a person Minecraft Skin Your Project Need to label data? Can you use existing classifiers, like Visual QA? CS 175: PROJECTS IN AI (SPRING 2017) 25

26 Recipe Planners Inventory Need (s) Steps > Get 2 wood planks > Make a stick > Get 2 diamonds > Make diamond sword CS 175: PROJECTS IN AI (SPRING 2017) 26

27 Lots of other possibilities Many other games in Minecraft Create AI for those? One AI that works for all of those? CS 175: PROJECTS IN AI (SPRING 2017) 27

28 Projects in AI in Minecraft Course Information Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 28

29 Projects in AI in Minecraft Course Information Some Project Ideas Introduction to Reinforcement Learning Based on slides by David Silver CS 175: PROJECTS IN AI (SPRING 2017) 29

30 Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 30

31 What makes it different? No direct supervision, only rewards Feedback is delayed, not instantaneous Time really matters, i.e. data is sequential Agent s actions affect what data it will receive Examples Fly stunt maneuvers in a helicopter Defeat the world champion at Backgammon Manage an investment portfolio Control a power station Make a humanoid robot walk Play many different Atari games better than humans Beat the world champion in Go CS 175: PROJECTS IN AI (SPRING 2017) 31

32 Agent-Environment Interface Agent decides on an action receives next observation receives next reward Environment executes the action computes next observation computes next reward CS 175: PROJECTS IN AI (SPRING 2017) 32

33 Reward, R t How well the agent is doing +, positive (Good) -, negative (Bad) Nothing about WHY it is doing well, could have little to do with A t-1 Agent is trying to maximize its cumulative reward CS 175: PROJECTS IN AI (SPRING 2017) 33

34 Example of Rewards Fly stunt maneuvers in a helicopter +ve reward for following desired trajectory ve reward for crashing Defeat the world champion at Backgammon +/ ve reward for winning/losing a game Manage an investment portfolio +ve reward for each $ in bank Control a power station +ve reward for producing power ve reward for exceeding safety thresholds Make a humanoid robot walk +ve reward for forward motion ve reward for falling over Play many different Atari games better than humans +/ ve reward for increasing/decreasing score CS 175: PROJECTS IN AI (SPRING 2017) 34

35 Sequential Decision Making Actions have long term consequences Rewards may be delayed May be better to sacrifice short term reward for long term benefit Examples A financial investment (may take months to mature) Refuelling a helicopter (might prevent a crash later) Blocking opponent moves (might eventually help win) Spend a lot of money and go to college (earn more later) Don t commit crimes (rewarded by not going to jail) Get started on Malmo/project soon (make it an easy quarter) A key aspect of intelligence, how far ahead are you able to plan? CS 175: PROJECTS IN AI (SPRING 2017) 35

36 Reinforcement Learning Given an environment (produces observations and rewards) Reinforcement Learning Automated agent that selects actions to maximize total rewards in the environment CS 175: PROJECTS IN AI (SPRING 2017) 36

37 Let s look at the Agent What does the choice of action depend on? Can you ignore O t completely? Is just O t enough? Or (O t,a t )? Is it last few observations? Is it all observations so far? CS 175: PROJECTS IN AI (SPRING 2017) 37

38 Agent State, S t History: everything that happened so far H t = O 1 R 1 A 1 O 2 R 2 A 2 O 3 R 3,,A t-1 O t R t State, S t can be O t O t R t A t-1 O t R t O t-3 O t-2 O t-1 O t In general, S t = f(h t ) You, as AI designer, specify this function CS 175: PROJECTS IN AI (SPRING 2017) 38

39 Agent Policy, π Current state S t π Next action A t Deterministic Policy: A # = π S # Stochastic Policy: π a s = P(A # = a S # = s) Good policy: Leads to larger cumulative reward Bad policy: Leads to worse cumulative reward (we will explore this more in the next week) CS 175: PROJECTS IN AI (SPRING 2017) 39

40 Example: Atari Rules are unknown What makes the score increase? Dynamics are unknown How do actions change pixels? CS 175: PROJECTS IN AI (SPRING 2017) 40

Video Time! https://www.youtube.com/watch?

41 Video Time! CS 175: PROJECTS IN AI (SPRING 2017) 41

Example: Robotic Soccer https://www.youtube.com/watch?

42 Example: Robotic Soccer CS 175: PROJECTS IN AI (SPRING 2017) 42

43 AlphaGo CS 175: PROJECTS IN AI (SPRING 2017) 43

44 Projects in AI in Minecraft Course Information Some Project Ideas Introduction to Reinforcement Learning CS 175: PROJECTS IN AI (SPRING 2017) 44

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1

CS 730/830: Intro AI. Prof. Wheeler Ruml. TA Bence Cserna. Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 CS 730/830: Intro AI Prof. Wheeler Ruml TA Bence Cserna Thinking inside the box. 5 handouts: course info, project info, schedule, slides, asst 1 Wheeler Ruml (UNH) Lecture 1, CS 730 1 / 23 My Definition