Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

Size: px
Start display at page:

Download "Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer"

Transcription

1 Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1

2 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of predict-then-adapt approaches Lack of knowledge of the optimal jamming strategy A naïve always jamming strategy is sub-optimal Energy is wasted Easy to detect Easy to be neutralized Cognitive capabilities necessary to survive in harsh environments We explore the learning capabilities of a jammer with delayed environment knowledge study using an type network that uses RTS-CTS protocol 2

3 State of the art Attacks at various protocol layers PHY / MAC / Network layers Naive jamming strategies continuous jamming, periodic jamming, partial-band, single/multi tone jamming Sensing-based jamming deceptive jamming, reactive jamming Perfect and instantaneous knowledge of the adversary jamming control packets jamming synchronization signals 3

4 Common Techniques to address Jamming Optimization framework assuming knowledge of certain parameters maximize BER/SER/PER Game theory one shot zero sum games, repeated games minimax formulation, Mutual Information games Information Theory channel capacity under jamming, saddle point solutions DoF, Mutual Information Problem : Lot of knowledge required Solution : Employ Learning Techniques 4

5 What is Learning Adaptation See the data and change the strategy Adapt in order to survive in the environment No memory in the system Learning More than an adaptive system Ability to detect patterns in the data Understand what s happening in the data and adapt Remember the strategies used and relate to the data Evaluate the outcome of the decisions taken, and gather knowledge to be exploited in future 5

6 Formal Definition of Learning A system is said to learn from experience/feedback E with respect to some class of actions A and performance measure C (similar to cost function), if its performance at tasks in A, as measured by C, improves with experience E. Learned hypothesis: model of problem/task T Model quality: performance measured by C 6

7 Different types of Learning Supervised Learning Teacher-student type learning Unsupervised Learning The student is left on his own Semi-supervised Learning Mixture of the above two learning techniques Reinforcement Learning Online Learning Learn by experimenting Experience is the only teacher 7 Image courtesy: : Simon blog.bigml.com Francesco Dennis Escolana Introduction Ruiz to Neural Networks

8 Intro to RL Reinforcement Learning a radio/agent learns the optimal strategy (for example, survival strategy) by repeatedly interacting with the environment. the agent receives feedback indicating whether the actions performed were good or bad learn to take actions which yield higher rewards Prior information Goals/Metrics Agent Past Experience Observations Environment Actions 8

9 Framework for RL Sequential Decision Model Action Action Present State Next State Reward Reward Time= t Time= t+1 Decision rule - At each time, the system state is used to choose an action Policy : set of rules mapping states to actions Sequence of decision rules generates Rewards 9 Commonly modeled as a Markov Decision Process

10 Markov Decision Process (MDP) something more than a Markov chain, think of a controlled MC MDP = {States, Actions, Transition Probability, Rewards} - {S,A,P,R} eg: from a jammer s perspective, the environment states could be Tx/No Tx, and the actions of the jammer could be Jam/Don t Jam. P is the S * A * S state transition probability matrix governs the dynamics of the environment, p(s s,a) R indicates the S * A reward matrix r( s, a) = reward obtained in state s when action a is executed Π = policy, mapping between states and actions 10

11 Goals of RL Maximize the cumulative discounted reward 0<= γ <= 1 > discount factor - how much do you value future - For finite time horizon, γ =1 is used (un-discounted MDP) The goal of the decision-maker is to choose a behavior that maximizes the expected return, irrespectively of how the process started (initial state) - A decision process that achieves the optimal values in all states is optimal For a given policy, value function V 11

12 Finding the optimal policy Notice the similarity to Dynamic Programming - The above equation is known as Bellman Equation Bellman operator - Affine linear operator - For, is a contraction mapping - By Banach Fixed point Theorem => unique solution exists 1) A function T:X->X is a contraction mapping if d(t(x),t(y)) <= qd(x,y) for some 0<=q<1, d=distance measure 2) Banach Fixed point Theorem = T admits a unique fixed point T(x*)=x*. - Can be found by starting with x 0 and define a sequence {x n }=T(x n-1 ), then x n converges to x* 12

13 More information about MDP If P is known a priori (known as indirect learning/ planning), can evaluate various and find the best one Note: works for small size MDPs only MDPs in general work well for small sizes of S, A (more about it at the end of the talk, Multi Armed Bandits) Online learning techniques used when P is not known Exploration versus Exploitation dilemma Common algorithm = ε-greedy Q-Learning, SARSA are other online learning techniques 13 Images courtesy : Microsoft Research

14 Can we have instantaneous knowledge? As tasks and environments grow more complex, an agent s observations of its environment are more often than not delayed - Littman 2009 e.g., direct control of the Mars rover from Earth is limited by the communication latency. delay may not be limited to a single time step. When a jammer disrupts a DATA packet, it is not aware whether the jamming was successful or not until an ACK packet is sent by the receiver. A 'wait' agent is sub-optimal in such scenarios; better utilize the time by doing some actions. 14

15 How do we handle delayed state observations? A new MDP framework is developed to handle delayed learning scenarios - Altman 1992 {S,A,P,R,k}, k= observation delay {I k,a,p,r} = equivalent augmented MDP I k augmented state space of size S * A k & Since the state s t-k+1 is unknown perfectly, 15

16 Transition-based rewards But again, these frameworks assume state-based rewards What if there are transition-based rewards? We developed a new framework to handle this Delayed Learning Framework with Transition-based Rewards Bellman s optimality rules still hold true P π and R π are now I k * I k matrices, can handle transition-based rewards (a jamming example will be shown soon) 16

17 Jamming via Delayed Learning We consider an wireless network with one user MAC-layer jamming attack is studied Fig: Basic Protocol RTS = Request to Send CTS = Clear to Send ACK = Acknowledgement Fig: Model for Victim 17

18 Jammer s Model Assumptions MAC protocol is known to the jammer Can identify the ACK/NACK packets Jamming success probability ρ is unknown The packets form the MDP states Jammer can jam any of them, so find optimal among 16 policies Feedback = Energy expended and Throughput Allowed to jam RTS, CTS & ACK = -E to jam DATA = -10E 18 Throughput allowed = -T (WAIT followed by ACK indicates this)

19 So what is delayed? Just to make things clear The jammer cannot identify the packet before transmission happens Packet type known perfectly after 1 time slot Energy cost known instantaneously based on actions taken Throughput cost known only when ACK to WAIT transition happens Notice that reward is based on transition and not on states themselves Objective Minimize Costs and deny any communication exchange 19

20 Optimal Performance benchmark result Assume ρ is known E = -10, T = -100 The optimal theoretical policy follows from the novel delayed learning framework. Why is jamming a CTS packet better than jamming RTS or ACK packets? 20

21 Which policy to use? Jamming as a function of Energy and Throughput Costs, ρ=0.5 21

22 What effect does delay have? True ρ=0.3 Learn ρ by jamming al states and observe environment 1 episode = 1000 time slots One policy is evaluated per episode ε-greedy is used for exploration vs exploitation 22

23 What effect does delay have? (unknown model) True ρ=0.5 Unknown Model = Learn ReTx limit, average CW sizes by jamming all states 1 episode = 1000 time slots One policy is evaluated per episode ε-greedy is used for exploration vs exploitation 23

24 So in this work - We explored whether a jammer can learn its surroundings or not Instantaneous knowledge is not readily available in most practical systems Need to deal with delay (recall states are known with a delay) A Delayed Reinforcement Learning framework was developed to address such delayed cognitive learning scenarios. An example framework was considered and optimal jamming policies against this network were obtained. The optimal policies match intuition. To be done Varying ρ, error in the feedback?? 24

25 What did we learn from this problem? Small time delays can be modeled easily using the MDP framework MDPs work well for small sizes of S and A Finite time gurantees can be given MDPs model single user scenarios very well Our experience with multi-user MDPs Not so good Especially when the MDPs are coupled (as in framework) Alternative learning algorithms being explored 25

26 Multi-armed bandits Another widely explored learning algorithm Can be related to the MDP theory and creation of bandit-processes Gittin s Indices An alternative defintion based on Regret formulation Learn to intelligently explore and exploit, and choose the best arm Widely used algorithm Upper Confidence bound (UCB1) 26 Image Courtesy : Daniel Jakubisin

27 What did we do with MAB? Learn the optimal physical layer jamming strategies Actions = {Signaling Scheme, P J,ON-OFF duration} Only needs ACK/ NACK as feedback Can give theoretical guarantees for the jamming performance cumulative and one-step regret 27

28 Convergence to optimal strategy 28

29 Tracking adaptive users 29

Jamming Bandits. arxiv: v1 [cs.it] 13 Nov 2014 I. INTRODUCTION

Jamming Bandits. arxiv: v1 [cs.it] 13 Nov 2014 I. INTRODUCTION Jamming Bandits 1 SaiDhiraj Amuru, Cem Tekin, Mihaela van der Schaar, R. Michael Buehrer Bradley Department of Electrical and Computer Engineering, Virginia Tech Department of Electrical Engineering, UCLA

More information

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy

More information

Fast Online Learning of Antijamming and Jamming Strategies

Fast Online Learning of Antijamming and Jamming Strategies Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This

More information

Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm

Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm Feten Slimeni, Bart Scheers, Zied Chtourou and Vincent Le Nir VRIT Lab - Military Academy of Tunisia, Nabeul, Tunisia

More information

A Multi Armed Bandit Formulation of Cognitive Spectrum Access

A Multi Armed Bandit Formulation of Cognitive Spectrum Access 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung

Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung December 12, 2013 Presented at IEEE GLOBECOM 2013, Atlanta, GA Outline Introduction Competing Cognitive

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process

More information

Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks

Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks 1 Qian Wang, Hai Su, Kui Ren, and Kai Xing Department of ECE, Illinois Institute of Technology, Email:

More information

A Systematic Learning Method for Optimal Jamming

A Systematic Learning Method for Optimal Jamming A Systematic Learning ethod for Optimal Jamming SaiDhiraj Amuru, Cem ekin, ihaela van der Schaar, R. ichael Buehrer Bradley Department of Electrical and Computer Engineering, Virginia ech Department of

More information

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill

TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances

More information

Resource Management in QoS-Aware Wireless Cellular Networks

Resource Management in QoS-Aware Wireless Cellular Networks Resource Management in QoS-Aware Wireless Cellular Networks Zhi Zhang Dept. of Electrical and Computer Engineering Colorado State University April 24, 2009 Zhi Zhang (ECE CSU) Resource Management in Wireless

More information

Wireless Network Security Spring 2016

Wireless Network Security Spring 2016 Wireless Network Security Spring 2016 Patrick Tague Class #16 Cross-Layer Attack & Defense 2016 Patrick Tague 1 Cross-layer design Class #16 Attacks using cross-layer data Cross-layer defenses / games

More information

Wireless Network Security Spring 2015

Wireless Network Security Spring 2015 Wireless Network Security Spring 2015 Patrick Tague Class #16 Cross-Layer Attack & Defense 2015 Patrick Tague 1 Cross-layer design Class #16 Attacks using cross-layer data Cross-layer defenses / games

More information

Efficiency and detectability of random reactive jamming in wireless networks

Efficiency and detectability of random reactive jamming in wireless networks Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering

More information

/13/$ IEEE

/13/$ IEEE A Game-Theoretical Anti-Jamming Scheme for Cognitive Radio Networks Changlong Chen and Min Song, University of Toledo ChunSheng Xin, Old Dominion University Jonathan Backens, Old Dominion University Abstract

More information

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written

More information

Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework

Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework Qing Zhao, Lang Tong, Anathram Swami, and Yunxia Chen EE360 Presentation: Kun Yi Stanford University

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement

More information

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks using the Markov Decision Process Approach

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks using the Markov Decision Process Approach Optimal Defense Against Jamming Attacks in Cognitive Radio Networks using the Markov Decision Process Approach Yongle Wu, Beibei Wang, and K. J. Ray Liu Department of Electrical and Computer Engineering,

More information

TUD Poker Challenge Reinforcement Learning with Imperfect Information

TUD Poker Challenge Reinforcement Learning with Imperfect Information TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker

More information

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs

CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).

More information

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar

Monte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:

More information

A Survey on Machine-Learning Techniques in Cognitive Radios

A Survey on Machine-Learning Techniques in Cognitive Radios 1 A Survey on Machine-Learning Techniques in Cognitive Radios Mario Bkassiny, Student Member, IEEE, Yang Li, Student Member, IEEE and Sudharman K. Jayaweera, Senior Member, IEEE Department of Electrical

More information

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,

More information

CandyCrush.ai: An AI Agent for Candy Crush

CandyCrush.ai: An AI Agent for Candy Crush CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.

More information

Opportunistic Communications under Energy & Delay Constraints

Opportunistic Communications under Energy & Delay Constraints Opportunistic Communications under Energy & Delay Constraints Narayan Mandayam (joint work with Henry Wang) Opportunistic Communications Wireless Data on the Move Intermittent Connectivity Opportunities

More information

The Game-Theoretic Approach to Machine Learning and Adaptation

The Game-Theoretic Approach to Machine Learning and Adaptation The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning

More information

Dynamic Spectrum Access in Cognitive Radio Networks. Xiaoying Gan 09/17/2009

Dynamic Spectrum Access in Cognitive Radio Networks. Xiaoying Gan 09/17/2009 Dynamic Spectrum Access in Cognitive Radio Networks Xiaoying Gan xgan@ucsd.edu 09/17/2009 Outline Introduction Cognitive Radio Framework MAC sensing Spectrum Occupancy Model Sensing policy Access policy

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)

Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,

More information

DeepMind Self-Learning Atari Agent

DeepMind Self-Learning Atari Agent DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy

More information

Reinforcement Learning-based Cooperative Sensing in Cognitive Radio Ad Hoc Networks

Reinforcement Learning-based Cooperative Sensing in Cognitive Radio Ad Hoc Networks 2st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications Reinforcement Learning-based Cooperative Sensing in Cognitive Radio Ad Hoc Networks Brandon F. Lo and Ian F.

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

ANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS. Xiaohua Li and Wednel Cadeau

ANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS. Xiaohua Li and Wednel Cadeau ANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS Xiaohua Li and Wednel Cadeau Department of Electrical and Computer Engineering State University of New York at Binghamton Binghamton, NY 392 {xli, wcadeau}@binghamton.edu

More information

CSE 473 Midterm Exam Feb 8, 2018

CSE 473 Midterm Exam Feb 8, 2018 CSE 473 Midterm Exam Feb 8, 2018 Name: This exam is take home and is due on Wed Feb 14 at 1:30 pm. You can submit it online (see the message board for instructions) or hand it in at the beginning of class.

More information

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search

More information

Cognitive Radio Technology using Multi Armed Bandit Access Scheme in WSN

Cognitive Radio Technology using Multi Armed Bandit Access Scheme in WSN IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p-ISSN: 2278-8735 PP 41-46 www.iosrjournals.org Cognitive Radio Technology using Multi Armed Bandit Access Scheme

More information

A Novel Cognitive Anti-jamming Stochastic Game

A Novel Cognitive Anti-jamming Stochastic Game A Novel Cognitive Anti-jamming Stochastic Game Mohamed Aref and Sudharman K. Jayaweera Communication and Information Sciences Laboratory (CISL) ECE, University of New Mexico, Albuquerque, NM and Bluecom

More information

An Artificially Intelligent Ludo Player

An Artificially Intelligent Ludo Player An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported

More information

CS 387: GAME AI BOARD GAMES

CS 387: GAME AI BOARD GAMES CS 387: GAME AI BOARD GAMES 5/28/2015 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html Reminders Check BBVista site for the

More information

arxiv: v1 [cs.it] 24 Aug 2010

arxiv: v1 [cs.it] 24 Aug 2010 Cognitive Radio Transmission Strategies for Primary Erasure Channels Ahmed El-Samadony, Mohammed Nafie and Ahmed Sultan Wireless Intelligent Networks Center (WINC) Nile University, Cairo, Egypt Email:

More information

CS434/534: Topics in Networked (Networking) Systems

CS434/534: Topics in Networked (Networking) Systems CS434/534: Topics in Networked (Networking) Systems Wireless Foundation: Wireless Mesh Networks Yang (Richard) Yang Computer Science Department Yale University 08A Watson Email: yry@cs.yale.edu http://zoo.cs.yale.edu/classes/cs434/

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks

A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks Ernst Nordström, Jakob Carlström Department of Computer Systems, Uppsala University, Box 325, S 751 05 Uppsala, Sweden Fax:

More information

Reinforcement Learning Agent for Scrolling Shooter Game

Reinforcement Learning Agent for Scrolling Shooter Game Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent

More information

COGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio

COGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio Tradeoff between Spoofing and Jamming a Cognitive Radio Qihang Peng, Pamela C. Cosman, and Laurence B. Milstein School of Comm. and Info. Engineering, University of Electronic Science and Technology of

More information

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility theorem (consistent decisions under uncertainty should

More information

Communication over a Time Correlated Channel with an Energy Harvesting Transmitter

Communication over a Time Correlated Channel with an Energy Harvesting Transmitter Communication over a Time Correlated Channel with an Energy Harvesting Transmitter Mehdi Salehi Heydar Abad Faculty of Engineering and Natural Sciences Sabanci University, Istanbul, Turkey mehdis@sabanciuniv.edu

More information

Chapter 3 Learning in Two-Player Matrix Games

Chapter 3 Learning in Two-Player Matrix Games Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play

More information

Real-time Distributed MIMO Systems. Hariharan Rahul Ezzeldin Hamed, Mohammed A. Abdelghany, Dina Katabi

Real-time Distributed MIMO Systems. Hariharan Rahul Ezzeldin Hamed, Mohammed A. Abdelghany, Dina Katabi Real-time Distributed MIMO Systems Hariharan Rahul Ezzeldin Hamed, Mohammed A. Abdelghany, Dina Katabi Dense Wireless Networks Stadiums Concerts Airports Malls Interference Limits Wireless Throughput APs

More information

Imperfect Monitoring in Multi-agent Opportunistic Channel Access

Imperfect Monitoring in Multi-agent Opportunistic Channel Access Imperfect Monitoring in Multi-agent Opportunistic Channel Access Ji Wang Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements

More information

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matt Johnston Massachusetts Institute of Technology Joint work with Eytan Modiano and Isaac Keslassy 07/11/13 Opportunistic

More information

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur

Module 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur Module 3 Problem Solving using Search- (Two agent) 3.1 Instructional Objective The students should understand the formulation of multi-agent search and in detail two-agent search. Students should b familiar

More information

Capacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks

Capacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (TO APPEAR) Capacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks SubodhaGunawardena, Student Member, IEEE, and Weihua Zhuang,

More information

Channel Sensing Order in Multi-user Cognitive Radio Networks

Channel Sensing Order in Multi-user Cognitive Radio Networks 2012 IEEE International Symposium on Dynamic Spectrum Access Networks Channel Sensing Order in Multi-user Cognitive Radio Networks Jie Zhao and Xin Wang Department of Electrical and Computer Engineering

More information

Multiple MAC Protocols Selection Strategies. Presented by Chen-Hsiang Feng

Multiple MAC Protocols Selection Strategies. Presented by Chen-Hsiang Feng Multiple MAC Protocols Selection Strategies Presented by Chen-Hsiang Feng Outline Motivation and Goal Simulation Environment MAC Selection Strategies Conclusions Motivation Today's devices have multiple

More information

Index. Index. More information. in this web service Cambridge University Press

Index. Index. More information. in this web service Cambridge University Press access policy, see sensing and access policy ADC, 21 alphabets input,output, 56 antennas, 20 frequency reconfigurable, 20 narrowband, 20 radiation pattern reconfigurable, 20 wideband, 20 autocorrelation

More information

Almost Optimal Dynamically-Ordered Multi-Channel Accessing for Cognitive Networks

Almost Optimal Dynamically-Ordered Multi-Channel Accessing for Cognitive Networks Almost Optimal Dynamically-Ordered Multi-Channel Accessing for Cognitive Networks Bowen Li, Panlong Yang, Xiang-Yang Li, Shaojie Tang, Yunhao Liu, Qihui Wu Institute of Communication Engineering, PLAUST

More information

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara

Reinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:

More information

Deep Learning for Launching and Mitigating Wireless Jamming Attacks

Deep Learning for Launching and Mitigating Wireless Jamming Attacks Deep Learning for Launching and Mitigating Wireless Jamming Attacks Tugba Erpek, Yalin E. Sagduyu, and Yi Shi arxiv:1807.02567v2 [cs.ni] 13 Dec 2018 Abstract An adversarial machine learning approach is

More information

Section Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies.

Section Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies. Section Notes 6 Game Theory Applied Math 121 Week of March 22, 2010 Goals for the week be comfortable with the elements of game theory. understand the difference between pure and mixed strategies. be able

More information

On the Predictability of Underwater Acoustic Communications Performance: the KAM11 Data Set as a Case Study

On the Predictability of Underwater Acoustic Communications Performance: the KAM11 Data Set as a Case Study On the Predictability of Underwater Acoustic Communications Performance: the KAM11 Data Set as a Case Study Beatrice Tomasi, Prof. James C. Preisig, Prof. Michele Zorzi Objectives and motivations Underwater

More information

Institute for Critical Technology and Applied Science. Machine Learning for Radar State Determination. Status report 2017/11/09

Institute for Critical Technology and Applied Science. Machine Learning for Radar State Determination. Status report 2017/11/09 Institute for Critical Technology and Applied Science Machine Learning for Radar State Determination Status report 2017/11/09 Background/Goals Understand machine learning and its various flavors Demonstrate

More information

How (Information Theoretically) Optimal Are Distributed Decisions?

How (Information Theoretically) Optimal Are Distributed Decisions? How (Information Theoretically) Optimal Are Distributed Decisions? Vaneet Aggarwal Department of Electrical Engineering, Princeton University, Princeton, NJ 08544. vaggarwa@princeton.edu Salman Avestimehr

More information

Optimal Foresighted Multi-User Wireless Video

Optimal Foresighted Multi-User Wireless Video Optimal Foresighted Multi-User Wireless Video Yuanzhang Xiao, Student Member, IEEE, and Mihaela van der Schaar, Fellow, IEEE Department of Electrical Engineering, UCLA. Email: yxiao@seas.ucla.edu, mihaela@ee.ucla.edu.

More information

Lecture Notes on Game Theory (QTM)

Lecture Notes on Game Theory (QTM) Theory of games: Introduction and basic terminology, pure strategy games (including identification of saddle point and value of the game), Principle of dominance, mixed strategy games (only arithmetic

More information

Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control

Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control Dejan V. Djonin, Vikram Krishnamurthy, Fellow, IEEE Abstract

More information

ROBUST SATELLITE COMMUNICATIONS UNDER HOSTILE INTERFERENCE

ROBUST SATELLITE COMMUNICATIONS UNDER HOSTILE INTERFERENCE AFRL-RV-PS- TR-2014-0207 AFRL-RV-PS- TR-2014-0207 ROBUST SATELLITE COMMUNICATIONS UNDER HOSTILE INTERFERENCE Marc Lichtman and Jeffrey Reed Virginia Tech 1880 Pratt Drive, Ste. 2006 Blacksburg, VA 24060

More information

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming 1 UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming Xiaozhen Lu, Liang Xiao, Canhuang Dai Dept. of Communication Engineering, Xiamen Univ., Xiamen, China. Email: lxiao@xmu.edu.cn

More information

Fast Online Learning of Antijamming and Jamming Strategies

Fast Online Learning of Antijamming and Jamming Strategies Fast Online Learning of Antijamming and Jamming Strategies Youngjune Gwon MIT Lincoln Laboratory gyj@ll.mit.edu Siamak Dastangoo MIT Lincoln Laboratory sia@ll.mit.edu Carl Fossa MIT Lincoln Laboratory

More information

Cognitive Radio: Brain-Empowered Wireless Communcations

Cognitive Radio: Brain-Empowered Wireless Communcations Cognitive Radio: Brain-Empowered Wireless Communcations Simon Haykin, Life Fellow, IEEE Matt Yu, EE360 Presentation, February 15 th 2012 Overview Motivation Background Introduction Radio-scene analysis

More information

IN the last few years, Wireless Sensor Networks (WSNs)

IN the last few years, Wireless Sensor Networks (WSNs) Joint Retransmission, Compression and Channel Coding for Data Fidelity under Energy Constraints Chiara Pielli, Student Member, IEEE, Čedomir Stefanović, Senior Member, IEEE, Petar Popovski, Fellow, IEEE,

More information

Game-Playing & Adversarial Search

Game-Playing & Adversarial Search Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,

More information

CMU-Q Lecture 20:

CMU-Q Lecture 20: CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent

More information

CS188 Spring 2014 Section 3: Games

CS188 Spring 2014 Section 3: Games CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the

More information

Performance Analysis of Multiuser MIMO Systems with Scheduling and Antenna Selection

Performance Analysis of Multiuser MIMO Systems with Scheduling and Antenna Selection Performance Analysis of Multiuser MIMO Systems with Scheduling and Antenna Selection Mohammad Torabi Wessam Ajib David Haccoun Dept. of Electrical Engineering Dept. of Computer Science Dept. of Electrical

More information

Distributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels

Distributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels 1 Distributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels Sumit J. Darak and Manjesh K. Hanawal arxiv:181.11651v1 [cs.ni] Dec 018 Abstract Next generation networks

More information

Deep Learning for Autonomous Driving

Deep Learning for Autonomous Driving Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous

More information

Alternation in the repeated Battle of the Sexes

Alternation in the repeated Battle of the Sexes Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated

More information

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael

Applying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results

More information

Wireless Network Security Spring 2012

Wireless Network Security Spring 2012 Wireless Network Security 14-814 Spring 2012 Patrick Tague Class #8 Interference and Jamming Announcements Homework #1 is due today Questions? Not everyone has signed up for a Survey These are required,

More information

OPPORTUNISTIC spectrum access (OSA), first envisioned

OPPORTUNISTIC spectrum access (OSA), first envisioned IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 5, MAY 2008 2053 Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Student Member,

More information

More on games (Ch )

More on games (Ch ) More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to modify the minimax algorithm to prune only bad searches (i.e. alpha-beta pruning) This rule of checking

More information

Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks

Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks Hussein Al-Zubaidy SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: hussein@sce.carleton.ca 21 August

More information

Frequency-Hopped Spread-Spectrum

Frequency-Hopped Spread-Spectrum Chapter Frequency-Hopped Spread-Spectrum In this chapter we discuss frequency-hopped spread-spectrum. We first describe the antijam capability, then the multiple-access capability and finally the fading

More information

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007 3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 53, NO 10, OCTOBER 2007 Resource Allocation for Wireless Fading Relay Channels: Max-Min Solution Yingbin Liang, Member, IEEE, Venugopal V Veeravalli, Fellow,

More information

arxiv: v1 [cs.it] 26 Jan 2016

arxiv: v1 [cs.it] 26 Jan 2016 Echo State Networks for Self-Organizing Resource Allocation in LTE-U with Uplink-Downlink Decoupling Mingzhe Chen, Walid Saad, and Changchuan Yin Beijing Key Laboratory of Network System Architecture and

More information

Bandit Algorithms Continued: UCB1

Bandit Algorithms Continued: UCB1 Bandit Algorithms Continued: UCB1 Noel Welsh 09 November 2010 Noel Welsh () Bandit Algorithms Continued: UCB1 09 November 2010 1 / 18 Annoucements Lab is busy Wednesday afternoon from 13:00 to 15:00 (Some)

More information

Fast Reinforcement Learning for Energy-Efficient Wireless Communication

Fast Reinforcement Learning for Energy-Efficient Wireless Communication 6262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011 Fast Reinforcement Learning for Energy-Efficient Wireless Communication Nicholas Mastronarde and Mihaela van der Schaar Abstract

More information

AN ABSTRACT OF THE THESIS OF. Pavithra Venkatraman for the degree of Master of Science in

AN ABSTRACT OF THE THESIS OF. Pavithra Venkatraman for the degree of Master of Science in AN ABSTRACT OF THE THESIS OF Pavithra Venkatraman for the degree of Master of Science in Electrical and Computer Engineering presented on November 04, 2010. Title: Opportunistic Bandwidth Sharing Through

More information

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal Adversarial Reasoning: Sampling-Based Search with the UCT algorithm Joint work with Raghuram Ramanujan and Ashish Sabharwal Upper Confidence bounds for Trees (UCT) n The UCT algorithm (Kocsis and Szepesvari,

More information

Optimizing Media Access Strategy for Competing Cognitive Radio Networks

Optimizing Media Access Strategy for Competing Cognitive Radio Networks Optimizing Media Access Strategy for Competing Cognitive Radio Networks The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Simple, Optimal, Fast, and Robust Wireless Random Medium Access Control

Simple, Optimal, Fast, and Robust Wireless Random Medium Access Control Simple, Optimal, Fast, and Robust Wireless Random Medium Access Control Jianwei Huang Department of Information Engineering The Chinese University of Hong Kong KAIST-CUHK Workshop July 2009 J. Huang (CUHK)

More information

The Necessity of Average Rewards in Cooperative Multirobot Learning

The Necessity of Average Rewards in Cooperative Multirobot Learning Carnegie Mellon University Research Showcase @ CMU Institute for Software Research School of Computer Science 2002 The Necessity of Average Rewards in Cooperative Multirobot Learning Poj Tangamchit Carnegie

More information

Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna

Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna Vincent Lau Associate Prof., University of Hong Kong Senior Manager, ASTRI Agenda Bacground Lin Level vs System Level Performance

More information

Cooperative Multi-Agent Learning and Coordination for Cognitive Radio Networks

Cooperative Multi-Agent Learning and Coordination for Cognitive Radio Networks 1 Cooperative Multi-Agent Learning and Coordination for Cognitive Radio Networks William Zame, Jie Xu, and Mihaela van der Schaar Abstract The radio spectrum is a scarce resource. Cognitive radio stretches

More information

Spectrum Sharing in Cognitive Radio Networks

Spectrum Sharing in Cognitive Radio Networks Spectrum Sharing in Cognitive Radio Networks Fan Wang, Marwan Krunz, and Shuguang Cui Department of Electrical & Computer Engineering University of Arizona Tucson, AZ 85721 E-mail:{wangfan,krunz,cui}@ece.arizona.edu

More information

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Reinforcement Learning in Games Autonomous Learning Systems Seminar Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract

More information

Joint Adaptation of Frequency Hopping and Transmission Rate for Anti-jamming Wireless Systems

Joint Adaptation of Frequency Hopping and Transmission Rate for Anti-jamming Wireless Systems 1 Joint Adaptation of Frequency Hopping and Transmission Rate for Anti-jamming Wireless Systems Manjesh K. Hanawal, Mohammad J. Abdel-Rahman, Member, IEEE, and Marwan Krunz, Fellow, IEEE Abstract Wireless

More information

Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System 217 25th European Signal Processing Conference (EUSIPCO) Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang,

More information