Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer
|
|
- Ferdinand Norris
- 5 years ago
- Views:
Transcription
1 Learning via Delayed Knowledge A Case of Jamming SaiDhiraj Amuru and R. Michael Buehrer 1
2 Why do we need an Intelligent Jammer? Dynamic environment conditions in electronic warfare scenarios failure of predict-then-adapt approaches Lack of knowledge of the optimal jamming strategy A naïve always jamming strategy is sub-optimal Energy is wasted Easy to detect Easy to be neutralized Cognitive capabilities necessary to survive in harsh environments We explore the learning capabilities of a jammer with delayed environment knowledge study using an type network that uses RTS-CTS protocol 2
3 State of the art Attacks at various protocol layers PHY / MAC / Network layers Naive jamming strategies continuous jamming, periodic jamming, partial-band, single/multi tone jamming Sensing-based jamming deceptive jamming, reactive jamming Perfect and instantaneous knowledge of the adversary jamming control packets jamming synchronization signals 3
4 Common Techniques to address Jamming Optimization framework assuming knowledge of certain parameters maximize BER/SER/PER Game theory one shot zero sum games, repeated games minimax formulation, Mutual Information games Information Theory channel capacity under jamming, saddle point solutions DoF, Mutual Information Problem : Lot of knowledge required Solution : Employ Learning Techniques 4
5 What is Learning Adaptation See the data and change the strategy Adapt in order to survive in the environment No memory in the system Learning More than an adaptive system Ability to detect patterns in the data Understand what s happening in the data and adapt Remember the strategies used and relate to the data Evaluate the outcome of the decisions taken, and gather knowledge to be exploited in future 5
6 Formal Definition of Learning A system is said to learn from experience/feedback E with respect to some class of actions A and performance measure C (similar to cost function), if its performance at tasks in A, as measured by C, improves with experience E. Learned hypothesis: model of problem/task T Model quality: performance measured by C 6
7 Different types of Learning Supervised Learning Teacher-student type learning Unsupervised Learning The student is left on his own Semi-supervised Learning Mixture of the above two learning techniques Reinforcement Learning Online Learning Learn by experimenting Experience is the only teacher 7 Image courtesy: : Simon blog.bigml.com Francesco Dennis Escolana Introduction Ruiz to Neural Networks
8 Intro to RL Reinforcement Learning a radio/agent learns the optimal strategy (for example, survival strategy) by repeatedly interacting with the environment. the agent receives feedback indicating whether the actions performed were good or bad learn to take actions which yield higher rewards Prior information Goals/Metrics Agent Past Experience Observations Environment Actions 8
9 Framework for RL Sequential Decision Model Action Action Present State Next State Reward Reward Time= t Time= t+1 Decision rule - At each time, the system state is used to choose an action Policy : set of rules mapping states to actions Sequence of decision rules generates Rewards 9 Commonly modeled as a Markov Decision Process
10 Markov Decision Process (MDP) something more than a Markov chain, think of a controlled MC MDP = {States, Actions, Transition Probability, Rewards} - {S,A,P,R} eg: from a jammer s perspective, the environment states could be Tx/No Tx, and the actions of the jammer could be Jam/Don t Jam. P is the S * A * S state transition probability matrix governs the dynamics of the environment, p(s s,a) R indicates the S * A reward matrix r( s, a) = reward obtained in state s when action a is executed Π = policy, mapping between states and actions 10
11 Goals of RL Maximize the cumulative discounted reward 0<= γ <= 1 > discount factor - how much do you value future - For finite time horizon, γ =1 is used (un-discounted MDP) The goal of the decision-maker is to choose a behavior that maximizes the expected return, irrespectively of how the process started (initial state) - A decision process that achieves the optimal values in all states is optimal For a given policy, value function V 11
12 Finding the optimal policy Notice the similarity to Dynamic Programming - The above equation is known as Bellman Equation Bellman operator - Affine linear operator - For, is a contraction mapping - By Banach Fixed point Theorem => unique solution exists 1) A function T:X->X is a contraction mapping if d(t(x),t(y)) <= qd(x,y) for some 0<=q<1, d=distance measure 2) Banach Fixed point Theorem = T admits a unique fixed point T(x*)=x*. - Can be found by starting with x 0 and define a sequence {x n }=T(x n-1 ), then x n converges to x* 12
13 More information about MDP If P is known a priori (known as indirect learning/ planning), can evaluate various and find the best one Note: works for small size MDPs only MDPs in general work well for small sizes of S, A (more about it at the end of the talk, Multi Armed Bandits) Online learning techniques used when P is not known Exploration versus Exploitation dilemma Common algorithm = ε-greedy Q-Learning, SARSA are other online learning techniques 13 Images courtesy : Microsoft Research
14 Can we have instantaneous knowledge? As tasks and environments grow more complex, an agent s observations of its environment are more often than not delayed - Littman 2009 e.g., direct control of the Mars rover from Earth is limited by the communication latency. delay may not be limited to a single time step. When a jammer disrupts a DATA packet, it is not aware whether the jamming was successful or not until an ACK packet is sent by the receiver. A 'wait' agent is sub-optimal in such scenarios; better utilize the time by doing some actions. 14
15 How do we handle delayed state observations? A new MDP framework is developed to handle delayed learning scenarios - Altman 1992 {S,A,P,R,k}, k= observation delay {I k,a,p,r} = equivalent augmented MDP I k augmented state space of size S * A k & Since the state s t-k+1 is unknown perfectly, 15
16 Transition-based rewards But again, these frameworks assume state-based rewards What if there are transition-based rewards? We developed a new framework to handle this Delayed Learning Framework with Transition-based Rewards Bellman s optimality rules still hold true P π and R π are now I k * I k matrices, can handle transition-based rewards (a jamming example will be shown soon) 16
17 Jamming via Delayed Learning We consider an wireless network with one user MAC-layer jamming attack is studied Fig: Basic Protocol RTS = Request to Send CTS = Clear to Send ACK = Acknowledgement Fig: Model for Victim 17
18 Jammer s Model Assumptions MAC protocol is known to the jammer Can identify the ACK/NACK packets Jamming success probability ρ is unknown The packets form the MDP states Jammer can jam any of them, so find optimal among 16 policies Feedback = Energy expended and Throughput Allowed to jam RTS, CTS & ACK = -E to jam DATA = -10E 18 Throughput allowed = -T (WAIT followed by ACK indicates this)
19 So what is delayed? Just to make things clear The jammer cannot identify the packet before transmission happens Packet type known perfectly after 1 time slot Energy cost known instantaneously based on actions taken Throughput cost known only when ACK to WAIT transition happens Notice that reward is based on transition and not on states themselves Objective Minimize Costs and deny any communication exchange 19
20 Optimal Performance benchmark result Assume ρ is known E = -10, T = -100 The optimal theoretical policy follows from the novel delayed learning framework. Why is jamming a CTS packet better than jamming RTS or ACK packets? 20
21 Which policy to use? Jamming as a function of Energy and Throughput Costs, ρ=0.5 21
22 What effect does delay have? True ρ=0.3 Learn ρ by jamming al states and observe environment 1 episode = 1000 time slots One policy is evaluated per episode ε-greedy is used for exploration vs exploitation 22
23 What effect does delay have? (unknown model) True ρ=0.5 Unknown Model = Learn ReTx limit, average CW sizes by jamming all states 1 episode = 1000 time slots One policy is evaluated per episode ε-greedy is used for exploration vs exploitation 23
24 So in this work - We explored whether a jammer can learn its surroundings or not Instantaneous knowledge is not readily available in most practical systems Need to deal with delay (recall states are known with a delay) A Delayed Reinforcement Learning framework was developed to address such delayed cognitive learning scenarios. An example framework was considered and optimal jamming policies against this network were obtained. The optimal policies match intuition. To be done Varying ρ, error in the feedback?? 24
25 What did we learn from this problem? Small time delays can be modeled easily using the MDP framework MDPs work well for small sizes of S and A Finite time gurantees can be given MDPs model single user scenarios very well Our experience with multi-user MDPs Not so good Especially when the MDPs are coupled (as in framework) Alternative learning algorithms being explored 25
26 Multi-armed bandits Another widely explored learning algorithm Can be related to the MDP theory and creation of bandit-processes Gittin s Indices An alternative defintion based on Regret formulation Learn to intelligently explore and exploit, and choose the best arm Widely used algorithm Upper Confidence bound (UCB1) 26 Image Courtesy : Daniel Jakubisin
27 What did we do with MAB? Learn the optimal physical layer jamming strategies Actions = {Signaling Scheme, P J,ON-OFF duration} Only needs ACK/ NACK as feedback Can give theoretical guarantees for the jamming performance cumulative and one-step regret 27
28 Convergence to optimal strategy 28
29 Tracking adaptive users 29
Jamming Bandits. arxiv: v1 [cs.it] 13 Nov 2014 I. INTRODUCTION
Jamming Bandits 1 SaiDhiraj Amuru, Cem Tekin, Mihaela van der Schaar, R. Michael Buehrer Bradley Department of Electrical and Computer Engineering, Virginia Tech Department of Electrical Engineering, UCLA
More informationPolicy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen
Policy Teaching Through Reward Function Learning Haoqi Zhang, David Parkes, and Yiling Chen School of Engineering and Applied Sciences Harvard University ACM EC 2009 Haoqi Zhang (Harvard University) Policy
More informationFast Online Learning of Antijamming and Jamming Strategies
Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This
More informationJamming mitigation in cognitive radio networks using a modified Q-learning algorithm
Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm Feten Slimeni, Bart Scheers, Zied Chtourou and Vincent Le Nir VRIT Lab - Military Academy of Tunisia, Nabeul, Tunisia
More informationA Multi Armed Bandit Formulation of Cognitive Spectrum Access
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationOptimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung
Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung December 12, 2013 Presented at IEEE GLOBECOM 2013, Atlanta, GA Outline Introduction Competing Cognitive
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Slides borrowed from Katerina Fragkiadaki Solving known MDPs: Dynamic Programming Markov Decision Process (MDP)! A Markov Decision Process
More informationJamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks
Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks 1 Qian Wang, Hai Su, Kui Ren, and Kai Xing Department of ECE, Illinois Institute of Technology, Email:
More informationA Systematic Learning Method for Optimal Jamming
A Systematic Learning ethod for Optimal Jamming SaiDhiraj Amuru, Cem ekin, ihaela van der Schaar, R. ichael Buehrer Bradley Department of Electrical and Computer Engineering, Virginia ech Department of
More informationTRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS. Thomas Keller and Malte Helmert Presented by: Ryan Berryhill
TRIAL-BASED HEURISTIC TREE SEARCH FOR FINITE HORIZON MDPS Thomas Keller and Malte Helmert Presented by: Ryan Berryhill Outline Motivation Background THTS framework THTS algorithms Results Motivation Advances
More informationResource Management in QoS-Aware Wireless Cellular Networks
Resource Management in QoS-Aware Wireless Cellular Networks Zhi Zhang Dept. of Electrical and Computer Engineering Colorado State University April 24, 2009 Zhi Zhang (ECE CSU) Resource Management in Wireless
More informationWireless Network Security Spring 2016
Wireless Network Security Spring 2016 Patrick Tague Class #16 Cross-Layer Attack & Defense 2016 Patrick Tague 1 Cross-layer design Class #16 Attacks using cross-layer data Cross-layer defenses / games
More informationWireless Network Security Spring 2015
Wireless Network Security Spring 2015 Patrick Tague Class #16 Cross-Layer Attack & Defense 2015 Patrick Tague 1 Cross-layer design Class #16 Attacks using cross-layer data Cross-layer defenses / games
More informationEfficiency and detectability of random reactive jamming in wireless networks
Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering
More information/13/$ IEEE
A Game-Theoretical Anti-Jamming Scheme for Cognitive Radio Networks Changlong Chen and Min Song, University of Toledo ChunSheng Xin, Old Dominion University Jonathan Backens, Old Dominion University Abstract
More informationCS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s
CS88: Artificial Intelligence, Fall 20 Written 2: Games and MDP s Due: 0/5 submitted electronically by :59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators) but must be written
More informationDecentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework
Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework Qing Zhao, Lang Tong, Anathram Swami, and Yunxia Chen EE360 Presentation: Kun Yi Stanford University
More informationReinforcement Learning
Reinforcement Learning Reinforcement Learning Assumptions we made so far: Known state space S Known transition model T(s, a, s ) Known reward function R(s) not realistic for many real agents Reinforcement
More informationOptimal Defense Against Jamming Attacks in Cognitive Radio Networks using the Markov Decision Process Approach
Optimal Defense Against Jamming Attacks in Cognitive Radio Networks using the Markov Decision Process Approach Yongle Wu, Beibei Wang, and K. J. Ray Liu Department of Electrical and Computer Engineering,
More informationTUD Poker Challenge Reinforcement Learning with Imperfect Information
TUD Poker Challenge 2008 Reinforcement Learning with Imperfect Information Outline Reinforcement Learning Perfect Information Imperfect Information Lagging Anchor Algorithm Matrix Form Extensive Form Poker
More informationCS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs
Last name: First name: SID: Class account login: Collaborators: CS188 Spring 2011 Written 2: Minimax, Expectimax, MDPs Due: Monday 2/28 at 5:29pm either in lecture or in 283 Soda Drop Box (no slip days).
More informationMonte Carlo Tree Search and AlphaGo. Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar
Monte Carlo Tree Search and AlphaGo Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar Zero-Sum Games and AI A player s utility gain or loss is exactly balanced by the combined gain or loss of opponents:
More informationA Survey on Machine-Learning Techniques in Cognitive Radios
1 A Survey on Machine-Learning Techniques in Cognitive Radios Mario Bkassiny, Student Member, IEEE, Yang Li, Student Member, IEEE and Sudharman K. Jayaweera, Senior Member, IEEE Department of Electrical
More informationCS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH Santiago Ontañón so367@drexel.edu Recall: Adversarial Search Idea: When there is only one agent in the world, we can solve problems using DFS, BFS, ID,
More informationCandyCrush.ai: An AI Agent for Candy Crush
CandyCrush.ai: An AI Agent for Candy Crush Jiwoo Lee, Niranjan Balachandar, Karan Singhal December 16, 2016 1 Introduction Candy Crush, a mobile puzzle game, has become very popular in the past few years.
More informationOpportunistic Communications under Energy & Delay Constraints
Opportunistic Communications under Energy & Delay Constraints Narayan Mandayam (joint work with Henry Wang) Opportunistic Communications Wireless Data on the Move Intermittent Connectivity Opportunities
More informationThe Game-Theoretic Approach to Machine Learning and Adaptation
The Game-Theoretic Approach to Machine Learning and Adaptation Nicolò Cesa-Bianchi Università degli Studi di Milano Nicolò Cesa-Bianchi (Univ. di Milano) Game-Theoretic Approach 1 / 25 Machine Learning
More informationDynamic Spectrum Access in Cognitive Radio Networks. Xiaoying Gan 09/17/2009
Dynamic Spectrum Access in Cognitive Radio Networks Xiaoying Gan xgan@ucsd.edu 09/17/2009 Outline Introduction Cognitive Radio Framework MAC sensing Spectrum Occupancy Model Sensing policy Access policy
More informationREINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING
REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures
More informationIntroduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.)
Introduction to Neuro-Dynamic Programming (Or, how to count cards in blackjack and do other fun things too.) Eric B. Laber February 12, 2008 Eric B. Laber () Introduction to Neuro-Dynamic Programming (Or,
More informationDeepMind Self-Learning Atari Agent
DeepMind Self-Learning Atari Agent Human-level control through deep reinforcement learning Nature Vol 518, Feb 26, 2015 The Deep Mind of Demis Hassabis Backchannel / Medium.com interview with David Levy
More informationReinforcement Learning-based Cooperative Sensing in Cognitive Radio Ad Hoc Networks
2st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications Reinforcement Learning-based Cooperative Sensing in Cognitive Radio Ad Hoc Networks Brandon F. Lo and Ian F.
More informationCreating an Agent of Doom: A Visual Reinforcement Learning Approach
Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering
More informationANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS. Xiaohua Li and Wednel Cadeau
ANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS Xiaohua Li and Wednel Cadeau Department of Electrical and Computer Engineering State University of New York at Binghamton Binghamton, NY 392 {xli, wcadeau}@binghamton.edu
More informationCSE 473 Midterm Exam Feb 8, 2018
CSE 473 Midterm Exam Feb 8, 2018 Name: This exam is take home and is due on Wed Feb 14 at 1:30 pm. You can submit it online (see the message board for instructions) or hand it in at the beginning of class.
More informationSet 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask
Set 4: Game-Playing ICS 271 Fall 2017 Kalev Kask Overview Computer programs that play 2-player games game-playing as search with the complication of an opponent General principles of game-playing and search
More informationCognitive Radio Technology using Multi Armed Bandit Access Scheme in WSN
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p-ISSN: 2278-8735 PP 41-46 www.iosrjournals.org Cognitive Radio Technology using Multi Armed Bandit Access Scheme
More informationA Novel Cognitive Anti-jamming Stochastic Game
A Novel Cognitive Anti-jamming Stochastic Game Mohamed Aref and Sudharman K. Jayaweera Communication and Information Sciences Laboratory (CISL) ECE, University of New Mexico, Albuquerque, NM and Bluecom
More informationAn Artificially Intelligent Ludo Player
An Artificially Intelligent Ludo Player Andres Calderon Jaramillo and Deepak Aravindakshan Colorado State University {andrescj, deepakar}@cs.colostate.edu Abstract This project replicates results reported
More informationCS 387: GAME AI BOARD GAMES
CS 387: GAME AI BOARD GAMES 5/28/2015 Instructor: Santiago Ontañón santi@cs.drexel.edu Class website: https://www.cs.drexel.edu/~santi/teaching/2015/cs387/intro.html Reminders Check BBVista site for the
More informationarxiv: v1 [cs.it] 24 Aug 2010
Cognitive Radio Transmission Strategies for Primary Erasure Channels Ahmed El-Samadony, Mohammed Nafie and Ahmed Sultan Wireless Intelligent Networks Center (WINC) Nile University, Cairo, Egypt Email:
More informationCS434/534: Topics in Networked (Networking) Systems
CS434/534: Topics in Networked (Networking) Systems Wireless Foundation: Wireless Mesh Networks Yang (Richard) Yang Computer Science Department Yale University 08A Watson Email: yry@cs.yale.edu http://zoo.cs.yale.edu/classes/cs434/
More informationTutorial of Reinforcement: A Special Focus on Q-Learning
Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model
More informationA Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks
A Reinforcement Learning Scheme for Adaptive Link Allocation in ATM Networks Ernst Nordström, Jakob Carlström Department of Computer Systems, Uppsala University, Box 325, S 751 05 Uppsala, Sweden Fax:
More informationReinforcement Learning Agent for Scrolling Shooter Game
Reinforcement Learning Agent for Scrolling Shooter Game Peng Yuan (pengy@stanford.edu) Yangxin Zhong (yangxin@stanford.edu) Zibo Gong (zibo@stanford.edu) 1 Introduction and Task Definition 1.1 Game Agent
More informationCOGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio
Tradeoff between Spoofing and Jamming a Cognitive Radio Qihang Peng, Pamela C. Cosman, and Laurence B. Milstein School of Comm. and Info. Engineering, University of Electronic Science and Technology of
More informationSummary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility
Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility theorem (consistent decisions under uncertainty should
More informationCommunication over a Time Correlated Channel with an Energy Harvesting Transmitter
Communication over a Time Correlated Channel with an Energy Harvesting Transmitter Mehdi Salehi Heydar Abad Faculty of Engineering and Natural Sciences Sabanci University, Istanbul, Turkey mehdis@sabanciuniv.edu
More informationChapter 3 Learning in Two-Player Matrix Games
Chapter 3 Learning in Two-Player Matrix Games 3.1 Matrix Games In this chapter, we will examine the two-player stage game or the matrix game problem. Now, we have two players each learning how to play
More informationReal-time Distributed MIMO Systems. Hariharan Rahul Ezzeldin Hamed, Mohammed A. Abdelghany, Dina Katabi
Real-time Distributed MIMO Systems Hariharan Rahul Ezzeldin Hamed, Mohammed A. Abdelghany, Dina Katabi Dense Wireless Networks Stadiums Concerts Airports Malls Interference Limits Wireless Throughput APs
More informationImperfect Monitoring in Multi-agent Opportunistic Channel Access
Imperfect Monitoring in Multi-agent Opportunistic Channel Access Ji Wang Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements
More informationChannel Probing in Communication Systems: Myopic Policies Are Not Always Optimal
Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matt Johnston Massachusetts Institute of Technology Joint work with Eytan Modiano and Isaac Keslassy 07/11/13 Opportunistic
More informationModule 3. Problem Solving using Search- (Two agent) Version 2 CSE IIT, Kharagpur
Module 3 Problem Solving using Search- (Two agent) 3.1 Instructional Objective The students should understand the formulation of multi-agent search and in detail two-agent search. Students should b familiar
More informationCapacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (TO APPEAR) Capacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks SubodhaGunawardena, Student Member, IEEE, and Weihua Zhuang,
More informationChannel Sensing Order in Multi-user Cognitive Radio Networks
2012 IEEE International Symposium on Dynamic Spectrum Access Networks Channel Sensing Order in Multi-user Cognitive Radio Networks Jie Zhao and Xin Wang Department of Electrical and Computer Engineering
More informationMultiple MAC Protocols Selection Strategies. Presented by Chen-Hsiang Feng
Multiple MAC Protocols Selection Strategies Presented by Chen-Hsiang Feng Outline Motivation and Goal Simulation Environment MAC Selection Strategies Conclusions Motivation Today's devices have multiple
More informationIndex. Index. More information. in this web service Cambridge University Press
access policy, see sensing and access policy ADC, 21 alphabets input,output, 56 antennas, 20 frequency reconfigurable, 20 narrowband, 20 radiation pattern reconfigurable, 20 wideband, 20 autocorrelation
More informationAlmost Optimal Dynamically-Ordered Multi-Channel Accessing for Cognitive Networks
Almost Optimal Dynamically-Ordered Multi-Channel Accessing for Cognitive Networks Bowen Li, Panlong Yang, Xiang-Yang Li, Shaojie Tang, Yunhao Liu, Qihui Wu Institute of Communication Engineering, PLAUST
More informationReinforcement Learning for CPS Safety Engineering. Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara
Reinforcement Learning for CPS Safety Engineering Sam Green, Çetin Kaya Koç, Jieliang Luo University of California, Santa Barbara Motivations Safety-critical duties desired by CPS? Autonomous vehicle control:
More informationDeep Learning for Launching and Mitigating Wireless Jamming Attacks
Deep Learning for Launching and Mitigating Wireless Jamming Attacks Tugba Erpek, Yalin E. Sagduyu, and Yi Shi arxiv:1807.02567v2 [cs.ni] 13 Dec 2018 Abstract An adversarial machine learning approach is
More informationSection Notes 6. Game Theory. Applied Math 121. Week of March 22, understand the difference between pure and mixed strategies.
Section Notes 6 Game Theory Applied Math 121 Week of March 22, 2010 Goals for the week be comfortable with the elements of game theory. understand the difference between pure and mixed strategies. be able
More informationOn the Predictability of Underwater Acoustic Communications Performance: the KAM11 Data Set as a Case Study
On the Predictability of Underwater Acoustic Communications Performance: the KAM11 Data Set as a Case Study Beatrice Tomasi, Prof. James C. Preisig, Prof. Michele Zorzi Objectives and motivations Underwater
More informationInstitute for Critical Technology and Applied Science. Machine Learning for Radar State Determination. Status report 2017/11/09
Institute for Critical Technology and Applied Science Machine Learning for Radar State Determination Status report 2017/11/09 Background/Goals Understand machine learning and its various flavors Demonstrate
More informationHow (Information Theoretically) Optimal Are Distributed Decisions?
How (Information Theoretically) Optimal Are Distributed Decisions? Vaneet Aggarwal Department of Electrical Engineering, Princeton University, Princeton, NJ 08544. vaggarwa@princeton.edu Salman Avestimehr
More informationOptimal Foresighted Multi-User Wireless Video
Optimal Foresighted Multi-User Wireless Video Yuanzhang Xiao, Student Member, IEEE, and Mihaela van der Schaar, Fellow, IEEE Department of Electrical Engineering, UCLA. Email: yxiao@seas.ucla.edu, mihaela@ee.ucla.edu.
More informationLecture Notes on Game Theory (QTM)
Theory of games: Introduction and basic terminology, pure strategy games (including identification of saddle point and value of the game), Principle of dominance, mixed strategy games (only arithmetic
More informationQ-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control
Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Application to MIMO Transmission Control Dejan V. Djonin, Vikram Krishnamurthy, Fellow, IEEE Abstract
More informationROBUST SATELLITE COMMUNICATIONS UNDER HOSTILE INTERFERENCE
AFRL-RV-PS- TR-2014-0207 AFRL-RV-PS- TR-2014-0207 ROBUST SATELLITE COMMUNICATIONS UNDER HOSTILE INTERFERENCE Marc Lichtman and Jeffrey Reed Virginia Tech 1880 Pratt Drive, Ste. 2006 Blacksburg, VA 24060
More informationUAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming
1 UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming Xiaozhen Lu, Liang Xiao, Canhuang Dai Dept. of Communication Engineering, Xiamen Univ., Xiamen, China. Email: lxiao@xmu.edu.cn
More informationFast Online Learning of Antijamming and Jamming Strategies
Fast Online Learning of Antijamming and Jamming Strategies Youngjune Gwon MIT Lincoln Laboratory gyj@ll.mit.edu Siamak Dastangoo MIT Lincoln Laboratory sia@ll.mit.edu Carl Fossa MIT Lincoln Laboratory
More informationCognitive Radio: Brain-Empowered Wireless Communcations
Cognitive Radio: Brain-Empowered Wireless Communcations Simon Haykin, Life Fellow, IEEE Matt Yu, EE360 Presentation, February 15 th 2012 Overview Motivation Background Introduction Radio-scene analysis
More informationIN the last few years, Wireless Sensor Networks (WSNs)
Joint Retransmission, Compression and Channel Coding for Data Fidelity under Energy Constraints Chiara Pielli, Student Member, IEEE, Čedomir Stefanović, Senior Member, IEEE, Petar Popovski, Fellow, IEEE,
More informationGame-Playing & Adversarial Search
Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search (two lectures) Chapter 5.1-5.5 Next lecture topic: Constraint Satisfaction Problems (two lectures) Chapter 6.1-6.4,
More informationCMU-Q Lecture 20:
CMU-Q 15-381 Lecture 20: Game Theory I Teacher: Gianni A. Di Caro ICE-CREAM WARS http://youtu.be/jilgxenbk_8 2 GAME THEORY Game theory is the formal study of conflict and cooperation in (rational) multi-agent
More informationCS188 Spring 2014 Section 3: Games
CS188 Spring 2014 Section 3: Games 1 Nearly Zero Sum Games The standard Minimax algorithm calculates worst-case values in a zero-sum two player game, i.e. a game in which for all terminal states s, the
More informationPerformance Analysis of Multiuser MIMO Systems with Scheduling and Antenna Selection
Performance Analysis of Multiuser MIMO Systems with Scheduling and Antenna Selection Mohammad Torabi Wessam Ajib David Haccoun Dept. of Electrical Engineering Dept. of Computer Science Dept. of Electrical
More informationDistributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels
1 Distributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels Sumit J. Darak and Manjesh K. Hanawal arxiv:181.11651v1 [cs.ni] Dec 018 Abstract Next generation networks
More informationDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving Shai Shalev-Shwartz Mobileye IMVC dimension, March, 2016 S. Shalev-Shwartz is also affiliated with The Hebrew University Shai Shalev-Shwartz (MobilEye) DL for Autonomous
More informationAlternation in the repeated Battle of the Sexes
Alternation in the repeated Battle of the Sexes Aaron Andalman & Charles Kemp 9.29, Spring 2004 MIT Abstract Traditional game-theoretic models consider only stage-game strategies. Alternation in the repeated
More informationApplying Modern Reinforcement Learning to Play Video Games. Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael
Applying Modern Reinforcement Learning to Play Video Games Computer Science & Engineering Leung Man Ho Supervisor: Prof. LYU Rung Tsong Michael Outline Term 1 Review Term 2 Objectives Experiments & Results
More informationWireless Network Security Spring 2012
Wireless Network Security 14-814 Spring 2012 Patrick Tague Class #8 Interference and Jamming Announcements Homework #1 is due today Questions? Not everyone has signed up for a Survey These are required,
More informationOPPORTUNISTIC spectrum access (OSA), first envisioned
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 5, MAY 2008 2053 Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Student Member,
More informationMore on games (Ch )
More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to modify the minimax algorithm to prune only bad searches (i.e. alpha-beta pruning) This rule of checking
More informationDownlink Scheduler Optimization in High-Speed Downlink Packet Access Networks
Downlink Scheduler Optimization in High-Speed Downlink Packet Access Networks Hussein Al-Zubaidy SCE-Carleton University 1125 Colonel By Drive, Ottawa, ON, Canada Email: hussein@sce.carleton.ca 21 August
More informationFrequency-Hopped Spread-Spectrum
Chapter Frequency-Hopped Spread-Spectrum In this chapter we discuss frequency-hopped spread-spectrum. We first describe the antijam capability, then the multiple-access capability and finally the fading
More information3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007
3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 53, NO 10, OCTOBER 2007 Resource Allocation for Wireless Fading Relay Channels: Max-Min Solution Yingbin Liang, Member, IEEE, Venugopal V Veeravalli, Fellow,
More informationarxiv: v1 [cs.it] 26 Jan 2016
Echo State Networks for Self-Organizing Resource Allocation in LTE-U with Uplink-Downlink Decoupling Mingzhe Chen, Walid Saad, and Changchuan Yin Beijing Key Laboratory of Network System Architecture and
More informationBandit Algorithms Continued: UCB1
Bandit Algorithms Continued: UCB1 Noel Welsh 09 November 2010 Noel Welsh () Bandit Algorithms Continued: UCB1 09 November 2010 1 / 18 Annoucements Lab is busy Wednesday afternoon from 13:00 to 15:00 (Some)
More informationFast Reinforcement Learning for Energy-Efficient Wireless Communication
6262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 12, DECEMBER 2011 Fast Reinforcement Learning for Energy-Efficient Wireless Communication Nicholas Mastronarde and Mihaela van der Schaar Abstract
More informationAN ABSTRACT OF THE THESIS OF. Pavithra Venkatraman for the degree of Master of Science in
AN ABSTRACT OF THE THESIS OF Pavithra Venkatraman for the degree of Master of Science in Electrical and Computer Engineering presented on November 04, 2010. Title: Opportunistic Bandwidth Sharing Through
More informationAdversarial Reasoning: Sampling-Based Search with the UCT algorithm. Joint work with Raghuram Ramanujan and Ashish Sabharwal
Adversarial Reasoning: Sampling-Based Search with the UCT algorithm Joint work with Raghuram Ramanujan and Ashish Sabharwal Upper Confidence bounds for Trees (UCT) n The UCT algorithm (Kocsis and Szepesvari,
More informationOptimizing Media Access Strategy for Competing Cognitive Radio Networks
Optimizing Media Access Strategy for Competing Cognitive Radio Networks The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation
More informationSimple, Optimal, Fast, and Robust Wireless Random Medium Access Control
Simple, Optimal, Fast, and Robust Wireless Random Medium Access Control Jianwei Huang Department of Information Engineering The Chinese University of Hong Kong KAIST-CUHK Workshop July 2009 J. Huang (CUHK)
More informationThe Necessity of Average Rewards in Cooperative Multirobot Learning
Carnegie Mellon University Research Showcase @ CMU Institute for Software Research School of Computer Science 2002 The Necessity of Average Rewards in Cooperative Multirobot Learning Poj Tangamchit Carnegie
More informationMulti-user Space Time Scheduling for Wireless Systems with Multiple Antenna
Multi-user Space Time Scheduling for Wireless Systems with Multiple Antenna Vincent Lau Associate Prof., University of Hong Kong Senior Manager, ASTRI Agenda Bacground Lin Level vs System Level Performance
More informationCooperative Multi-Agent Learning and Coordination for Cognitive Radio Networks
1 Cooperative Multi-Agent Learning and Coordination for Cognitive Radio Networks William Zame, Jie Xu, and Mihaela van der Schaar Abstract The radio spectrum is a scarce resource. Cognitive radio stretches
More informationSpectrum Sharing in Cognitive Radio Networks
Spectrum Sharing in Cognitive Radio Networks Fan Wang, Marwan Krunz, and Shuguang Cui Department of Electrical & Computer Engineering University of Arizona Tucson, AZ 85721 E-mail:{wangfan,krunz,cui}@ece.arizona.edu
More informationReinforcement Learning in Games Autonomous Learning Systems Seminar
Reinforcement Learning in Games Autonomous Learning Systems Seminar Matthias Zöllner Intelligent Autonomous Systems TU-Darmstadt zoellner@rbg.informatik.tu-darmstadt.de Betreuer: Gerhard Neumann Abstract
More informationJoint Adaptation of Frequency Hopping and Transmission Rate for Anti-jamming Wireless Systems
1 Joint Adaptation of Frequency Hopping and Transmission Rate for Anti-jamming Wireless Systems Manjesh K. Hanawal, Mohammad J. Abdel-Rahman, Member, IEEE, and Marwan Krunz, Fellow, IEEE Abstract Wireless
More informationTraffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System
217 25th European Signal Processing Conference (EUSIPCO) Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang,
More information