Fast Online Learning of Antijamming and Jamming Strategies

Similar documents
Optimizing Media Access Strategy for Competing Cognitive Radio Networks Y. Gwon, S. Dastangoo, H. T. Kung

Fast Online Learning of Antijamming and Jamming Strategies

Learning via Delayed Knowledge A Case of Jamming. SaiDhiraj Amuru and R. Michael Buehrer

A Novel Cognitive Anti-jamming Stochastic Game

Optimizing Media Access Strategy for Competing Cognitive Radio Networks

Efficiency and detectability of random reactive jamming in wireless networks

/13/$ IEEE

Decentralized Cognitive MAC for Opportunistic Spectrum Access in Ad-Hoc Networks: A POMDP Framework

Jamming-resistant Multi-radio Multi-channel Opportunistic Spectrum Access in Cognitive Radio Networks

Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm

Cognitive Radio: Brain-Empowered Wireless Communcations

Resource Allocation in Energy-constrained Cooperative Wireless Networks

Reinforcement Learning in Games Autonomous Learning Systems Seminar

COGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio

Wireless Network Security Spring 2012

Reinforcement Learning Agent for Scrolling Shooter Game

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

A Two-Layer Coalitional Game among Rational Cognitive Radio Users

Chapter 3 Learning in Two-Player Matrix Games

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Simple, Optimal, Fast, and Robust Wireless Random Medium Access Control

CS188: Artificial Intelligence, Fall 2011 Written 2: Games and MDP s

SUPPOSE that we are planning to send a convoy through

Using the Time Dimension to Sense Signals with Partial Spectral Overlap. Mihir Laghate and Danijela Cabric 5 th December 2016

Improved Directional Perturbation Algorithm for Collaborative Beamforming

Adversarial Search 1

Game-playing: DeepBlue and AlphaGo

CS510 \ Lecture Ariel Stolerman

Multi-agent Reinforcement Learning Based Cognitive Anti-jamming

Adversarial Search. Robert Platt Northeastern University. Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA

Chapter 12. Cross-Layer Optimization for Multi- Hop Cognitive Radio Networks

Joint work with Dragana Bajović and Dušan Jakovetić. DLR/TUM Workshop, Munich,

Sequential Multi-Channel Access Game in Distributed Cognitive Radio Networks

ECE 174 Computer Assignment #2 Due Thursday 12/6/2012 GLOBAL POSITIONING SYSTEM (GPS) ALGORITHM

A Framework for Energy-efficient Adaptive Jamming of Adversarial Communications

ANTI-JAMMING PERFORMANCE OF COGNITIVE RADIO NETWORKS. Xiaohua Li and Wednel Cadeau

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. X, NO. X, XXX Optimal Multiband Transmission Under Hostile Jamming

Radar Receiver Calibration Toolkit

Capacity Analysis and Call Admission Control in Distributed Cognitive Radio Networks

LOCALIZATION AND ROUTING AGAINST JAMMERS IN WIRELESS NETWORKS

Achieving Desirable Gameplay Objectives by Niched Evolution of Game Parameters

BY INJECTING faked or replayed signals, a jammer aims

Adversarial Search. Rob Platt Northeastern University. Some images and slides are used from: AIMA CS188 UC Berkeley

Policy Teaching. Through Reward Function Learning. Haoqi Zhang, David Parkes, and Yiling Chen

CS 380: ARTIFICIAL INTELLIGENCE MONTE CARLO SEARCH. Santiago Ontañón

Optimization Techniques for Alphabet-Constrained Signal Design

UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming

3D-Assisted Image Feature Synthesis for Novel Views of an Object

Cognitive Radios Games: Overview and Perspectives

SPECTRUM resources are scarce and fixed spectrum allocation

A Multi Armed Bandit Formulation of Cognitive Spectrum Access

Distributed Game Theoretic Optimization Of Frequency Selective Interference Channels: A Cross Layer Approach

10703 Deep Reinforcement Learning and Control

Learning, prediction and selection algorithms for opportunistic spectrum access

Resource Management in QoS-Aware Wireless Cellular Networks

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

An Artificially Intelligent Ludo Player

Dynamic Spectrum Access in Cognitive Radio Networks. Xiaoying Gan 09/17/2009

Adversarial Search and Game Theory. CS 510 Lecture 5 October 26, 2017

Channel Sensing Order in Multi-user Cognitive Radio Networks

Sense in Order: Channel Selection for Sensing in Cognitive Radio Networks

Frequency-Hopped Spread-Spectrum

Skip Lists S 3 S 2 S 1. 2/6/2016 7:04 AM Skip Lists 1

Applications of Artificial Intelligence and Machine Learning in Othello TJHSST Computer Systems Lab

Distributed Power Control in Cellular and Wireless Networks - A Comparative Study

Cognitive Radio Jamming Mitigation using Markov Decision Process and Reinforcement Learning

Pseudorandom Time-Hopping Anti-Jamming Technique for Mobile Cognitive Users

Sense in Order: Channel Selection for Sensing in Cognitive Radio Networks

Distributed Collaborative Path Planning in Sensor Networks with Multiple Mobile Sensor Nodes

Energy-Efficient Routing in Wireless Networks in the Presence of Jamming

Artificial Intelligence. Minimax and alpha-beta pruning

Imperfect Monitoring in Multi-agent Opportunistic Channel Access

Competitive Distributed Spectrum Access in QoS-Constrained Cognitive Radio Networks

Jamming Bandits. arxiv: v1 [cs.it] 13 Nov 2014 I. INTRODUCTION

Reinforcement Learning Applied to a Game of Deceit

Wideband, Long-CPI GMTI

Scaling Laws for Cognitive Radio Network with Heterogeneous Mobile Secondary Users

Deep Learning for Launching and Mitigating Wireless Jamming Attacks

DISTRIBUTED INTELLIGENT SPECTRUM MANAGEMENT IN COGNITIVE RADIO AD HOC NETWORKS. Yi Song

Set 4: Game-Playing. ICS 271 Fall 2017 Kalev Kask

Cooperative Compressed Sensing for Decentralized Networks

Wireless Network Security Spring 2014

IN357: ADAPTIVE FILTERS

Pengju

OPPORTUNISTIC SPECTRUM ACCESS IN MULTI-USER MULTI-CHANNEL COGNITIVE RADIO NETWORKS

Game Playing State-of-the-Art CSE 473: Artificial Intelligence Fall Deterministic Games. Zero-Sum Games 10/13/17. Adversarial Search

DeepStack: Expert-Level AI in Heads-Up No-Limit Poker. Surya Prakash Chembrolu

EAVESDROPPING AND JAMMING COMMUNICATION NETWORKS

CS325 Artificial Intelligence Ch. 5, Games!

Performance Evaluation of Energy Detector for Cognitive Radio Network

Cooperative Sensing for Target Estimation and Target Localization

Cognitive Radio Technology using Multi Armed Bandit Access Scheme in WSN

Summary Overview of Topics in Econ 30200b: Decision theory: strong and weak domination by randomized strategies, domination theorem, expected utility

CS188 Spring 2014 Section 3: Games

Localization in Wireless Sensor Networks

Attack-Proof Collaborative Spectrum Sensing in Cognitive Radio Networks

AERONAUTICAL CHANNEL MODELING FOR PACKET NETWORK SIMULATORS

Announcements. Homework 1. Project 1. Due tonight at 11:59pm. Due Friday 2/8 at 4:00pm. Electronic HW1 Written HW1

MFF UK Prague

Analysis of Distributed Dynamic Spectrum Access Scheme in Cognitive Radios

Transcription:

Fast Online Learning of Antijamming and Jamming Strategies Y. Gwon, S. Dastangoo, C. Fossa, H. T. Kung December 9, 2015 Presented at the 58 th IEEE Global Communications Conference, San Diego, CA This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited.

Outline Introduction Background: Competing Cognitive Radio Network Problem Model Solution approaches Evaluation Conclusion GLOBECOM 2015 2

Introduction Competing Cognitive Radio Network (CCRN) models mobile networks under competition Blue Force (ally) vs. Red Force (enemy) Dynamic, open spectrum resource Nodes are cognitive radios Comm nodes and jammers Opportunistic data access Strategic jamming attacks Multi-channel open spectrum Intra-network cooperation Jam Jam Collision Jam Blue Force (BF) Network Red Force (RF) Network Network-wide competition GLOBECOM 2015 3

Background: Competing Cognitive Radio Network Formulation 1: Stochastic MAB <A B, A R, R> Blue-force (B) & Red-force (R) action sets: a B = {a BC, a BJ } A B, a R = {a RC, a RJ } A R Reward: R PD(r a B, a R ) Regret Γ = max a AB T r(a) T r(a Bt ) Optimal regret bound in O(log T) [Lai&Robbinsʹ85] Formulation 2: Markov Game <A B, A R, S, R, T> Stateful model with states S and probabilistic transition function T Strategy π: S PD(A) is probability distribution over action space Optimal strategy π * = arg max π E[ γ R(s,a B,a R )] can be computed by Q-learning via linear programming GLOBECOM 2015 4

New Problem Formulation Assume intelligent adversary Hostile Red-force can learn as efficiently as Blue-force Also, applies cognitive sensing to compute strategies Consequences Well-behaved stochastic channel reward invalid time-varying channel rewards More difficult to predict or model Nonstationarity in Red-force actions Random, arbitrary changepoint introduces dynamic changes GLOBECOM 2015 5

Revised Regret Model Stochastic MAB problems model regret Γ using reward function r(a) Γ = max a AB T r(a) T r(a Bt ) Using loss function l(a), we revise Γ Revised regret Λ with loss function l(.) Λ = l (a Bt ) min a AB l (a) Loss version is equivalent to reward version Γ But provides adversarial view as if: Red-force alters potential loss for Blue-force over time, revealing only l t (a Bt ) at time t GLOBECOM 2015 6

New Optimization Goals Find best Blue-force action that minimizes Λ over time a * = arg min a l t (a Bt ) min a AB l t (a) It s critical to estimate l t (.) accurately for new optimization l(.) evolves over t, and intelligent adversary makes it difficult to estimate GLOBECOM 2015 7

Our Approach: Online Convex Optimization If l t (.) convex set, optimal regret bound can be achieved by online convex programming [Zinkevichʹ03] Underlying idea is gradient descent/ascent What is gradient descent? Find minima of loss by tracing estimated gradient (slope) of loss f (x) initial_guess = x0 search_dir = f (x) choose step h > 0 x_next = x_cur h f (x_cur) stop when f (x) < ε Initial guess f (x F ) Stop x F x 0 x GLOBECOM 2015 8

Our New Algorithm: Fast Online Learning Sketch of key ideas Estimate expected loss function for next time Take gradient that leads to minimum loss iteratively Test if reached minimum is global or local When stuck at inefficiency (undesirable local min), use escape mechanism to get out Go back and repeat until convergence GLOBECOM 2015 9

New Algorithm Explained (1) l (regret) l t (a t ) a t+1 = a t + l * a t a t a t + a GLOBECOM 2015 10

New Algorithm Explained (2) l (regret) l t (a t ) a t+1 = a t + u l * a t a t a t + a GLOBECOM 2015 11

New Algorithm Explained (3) l (regret) a t+1 = a t l t (a t ) l * a t a t a t + a GLOBECOM 2015 12

Evaluation Wrote custom simulator in MATLAB Simulated spectrum with N = 10, 20, 30, 40, 50 channels Varied number of nodes M = 10 to 50 Number of jammers in M total nodes varied 2 to 10 Simulation duration = 5,000 time slots Algorithms evaluated 1. MAB (Blue-force) vs. random changepoint (Red-force) 2. Minimax-Q (Blue-force) vs. random changepoint (Redforce) 3. Proposed online (Blue-force) vs. random changepoint (Red-force) All algorithmic matchups in centralized control GLOBECOM 2015 13

GLOBECOM 2015 14 Results: Convergence Time

Results: Average Reward Performance (N = 40, M = 20) New algorithm finds optimal strategy much more rapidly than MAB and Q-learning based algorithms GLOBECOM 2015 15

Summary Extended Competing Cognitive Radio Network (CCRN) to harder class of problems under nonstochastic assumptions Random changepoints for enemy channel access & jamming strategies, time-varying channel reward Proposed new algorithm based on online convex programming Simpler than MAB and Q-learning Achieved much better convergence property Finds optimal strategy faster Future work Better channel activity prediction can help estimate more accurate loss function GLOBECOM 2015 16

GLOBECOM 2015 17 Support Materials

GLOBECOM 2015 18 Proposed Algorithm

Channel Activity Matrix, Outcome, Reward, State (1/2) Example: there are two comm nodes and two jammers for each BF and RF network BF uses channel 10 for control, RF channel 1 At time t, actions are the following A B t = {a B,comm = [7 3], a B,jam = [1 5]} a B,comm = [7 3] means BF comm node 1 transmit at channel 7, and comm node at 2 channel 3 A R t = {a R,comm = [3 5], a B,jam = [10 9]} How to figure out channel outcomes, compute rewards, and determine state? Channel Activity Matrix GLOBECOM 2015 19

Channel Activity Matrix, Outcome, Reward, State (2/2) CH Blue Force Red Force Reward Outcome Comm Jammer Comm Jammer BF RF 1 Jam BF jamming success +1 0 3 Tx Tx BF & RF comms collide 0 0 5 Jam Tx BF jamming success +1 0 7 Tx BF comm Tx success +1 0 9 Jam RF jamming fail 0 0 10 Jam RF jamming success 0 +1 GLOBECOM 2015 20