Reinforcement Learning Based Anti-jamming with Wideband Autonomous Cognitive Radios

Similar documents
A Novel Cognitive Anti-jamming Stochastic Game

Multi-agent Reinforcement Learning Based Cognitive Anti-jamming

MIMO Systems: Multiple Antenna Techniques

MODAL ANALYSIS OF A BEAM WITH CLOSELY SPACED MODE SHAPES

Frequency Calibration of A/D Converter in Software GPS Receivers

AN EVALUATION OF DIGILTAL ANTI-ALIASING FILTER FOR SPACE TELEMETRY SYSTEMS

UNIVERSITY OF SASKATCHEWAN EE456: Digital Communications FINAL EXAM, 9:00AM 12:00PM, December 9, 2010 (open-book) Examiner: Ha H.

REAL-TIME IMPLEMENTATION OF A NEURO-AVR FOR SYNCHRONOUS GENERATOR. M. M. Salem** A. M. Zaki** O. P. Malik*

SCK LAB MANUAL SAMPLE

Adaptive Space/Frequency Processing for Distributed Aperture Radars

DESIGN OF SECOND ORDER SIGMA-DELTA MODULATOR FOR AUDIO APPLICATIONS

Comm 502: Communication Theory. Lecture 5. Intersymbol Interference FDM TDM

Subcarrier exclusion techniques

Identification of Image Noise Sources in Digital Scanner Evaluation

Position Control of a Large Antenna System

Sampling Theory MODULE XIII LECTURE - 41 NON SAMPLING ERRORS

DIGITAL COMMUNICATION

Before the beginning of the Q wave At the top of the R wave After the end of the S wave

Analysis. Control of a dierential-wheeled robot. Part I. 1 Dierential Wheeled Robots. Ond ej Stan k

Robust Control of an Active Suspension System Using H 2 & H Control Methods. Fatemeh Jamshidi 1, Afshin Shaabany 1

A New Technique to TEC Regional Modeling using a Neural Network.

Active vibration isolation for a 6 degree of freedom scale model of a high precision machine

Produced in cooperation with. Revision: May 26, Overview

SINGLE-PHASE ACTIVE FILTER FOR HIGH ORDER HARMONICS COMPENSATION

HARMONIC COMPENSATION ANALYSIS USING UNIFIED SERIES SHUNT COMPENSATOR IN DISTRIBUTION SYSTEM

Control of Electromechanical Systems using Sliding Mode Techniques

A Simple DSP Laboratory Project for Teaching Real-Time Signal Sampling Rate Conversions

A Faster and Accurate Method for Spectral Testing Applicable to Noncoherent Data

Two Novel Handover Algorithms with Load Balancing for Heterogeneous Network

STRUCTURAL SEMI-ACTIVE CONTROL DEVICE

International Journal of Engineering Research & Technology (IJERT) ISSN: Vol. 1 Issue 6, August

Chapter Introduction

The Cascode and Cascaded Techniques LNA at 5.8GHz Using T-Matching Network for WiMAX Applications

Integral Control AGC of Interconnected Power Systems Using Area Control Errors Based On Tie Line Power Biasing

RESEARCH ON NEAR FIELD PASSIVE LOCALIZATION BASED ON PHASE MEASUREMENT TECHNOLOGY BY TWO TIMES FREQUENCY DIFFERENCE

The Performance Analysis of MIMO OFDM System with Different M-QAM Modulation and Convolution Channel Coding

Stability Analysis in a Cognitive Radio System with Cooperative Beamforming

GPS signal Rician fading model for precise navigation in urban environment

Deterministic Deployment for Wireless Image Sensor Nodes

Kalman Filtering Based Object Tracking in Surveillance Video System

Time-Domain Coupling to a Device on Printed Circuit Board Inside a Cavity. Chatrpol Lertsirimit, David R. Jackson and Donald R.

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 11, 2016 ISSN (online):

II. SYSTEM MODEL. A. Link and path model

ASSISTING PERSONAL POSITIONING IN INDOOR ENVIRONMENTS USING MAP MATCHING

Adaptive Groundroll filtering

Lab 7 Rev. 2 Open Lab Due COB Friday April 27, 2018

A Flyback Converter Fed Multilevel Inverter for AC Drives

Automatic Target Recognition with Unknown Orientation and Adaptive Waveforms

A COMPARISON OF METHODS FOR EVALUATING THE TEST ZONE PERFORMANCE OF ANECHOIC CHAMBERS DESIGNED FOR TESTING WIRELESS DEVICES

Pre- and Post-DFT Combining Space Diversity Receiver for Wideband Multi-Carrier Systems

R-R Interval Processing Using BIOPAC s HRV Algorithm Implementation

Revisiting Cross-channel Information Transfer for Chromatic Aberration Correction

Published in: Proceedings of 2018 IEEE 19th Workshop on Control and Modeling for Power Electronics (COMPEL)

On-Demand Spectrum Sharing By Flexible Time-Slotted Cognitive Radio Networks

Non-Linear UWB Receivers With MLSE Post-Detection

Gemini. The errors from the servo system are considered as the superposition of three things:

A Real-Time Wireless Channel Emulator For MIMO Systems

Active Harmonic Elimination in Multilevel Converters Using FPGA Control

MIMO Enabled Efficient Mapping of Data in WiMAX Networks

Adaptive Code Allocation for Interference Exploitation on the Downlink of MC-CDMA Systems

HIGH VOLTAGE DC-DC CONVERTER USING A SERIES STACKED TOPOLOGY

Design, Realization, and Analysis of PIFA for an RFID Mini-Reader

A Feasibility Study on Frequency Domain ADC for Impulse-UWB Receivers

A Two-Stage Optimization PID Algorithm

NAVAL POSTGRADUATE SCHOOL THESIS

Comparison Study in Various Controllers in Single-Phase Inverters

Synthetic aperture radar raw signal simulator for both pulsed and FM-CW modes

EFFICIENCY EVALUATION OF A DC TRANSMISSION SYSTEM BASED ON VOLTAGE SOURCE CONVERTERS

Constant Switching Frequency Self-Oscillating Controlled Class-D Amplifiers

A Flexible OFDM System Simulation Model. with BER Performance Test

An FM signal in the region of 4.2 to 4.6

A Programmable Compensation Circuit for System-on- Chip Application

Digital joint phase and sampling instant synchronisation for UMTS standard

Basic Study of Radial Distributions of Electromagnetic Vibration and Noise in Three-Phase Squirrel-Cage Induction Motor under Load Conditions

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Wireless Link SNR Mapping Onto An Indoor Testbed

Self-Programmable PID Compensator for Digitally Controlled SMPS

Method to Improve Range and Velocity Error Using De-interleaving and Frequency Interpolation for Automotive FMCW Radars

The Central Limit Theorem

Optimized BER Performance of Asymmetric Turbo Codes over AWGN Channel

Mobile Communications TCS 455

Experiment 8: Active Filters October 31, 2005

Hardware-in-the-loop tuning of a feedback controller for a buck converter using a GA

CIRCULAR SYNTHETIC APERTURE SONAR WITHOUT A BEACON

Mechatronics Laboratory Assignment 5 Motor Control and Straight-Line Robot Driving

Summary of Well Known Interface Standards

Modeling and Simulation of Digital Filter Jie Zhao

Improving the Regulatory Response of PID Controller Using Internal Model Control Principles

Studies on Novel Anti-jamming Technique of Unmanned Aerial Vehicle Data Link

Modulation Extension Control for Multilevel Converters Using Triplen Harmonic Injection with Low Switching Frequency

Asymptotic Diversity Analysis of Alamouti Transmit Diversity with Quasi-ML Decoding Algorithm in Time-Selective Fading Channels

Sloppy Addition and Multiplication

COST OF TRANSMISSION TRANSACTIONS: Comparison and Discussion of Used Methods

Power Conversion Efficiency of Airborne Parametric Array

Communication Systems, 5e

Adaptive PID Controller Based on Reinforcement Learning for Wind Turbine Control

New Resonance Type Fault Current Limiter

/09/$ IEEE 472

A simple low rate turbo-like code design for spread spectrum systems

Digital Control of Boost PFC AC-DC Converters with Predictive Control

Transcription:

1 Reinforcement Learning Baed Anti-jamming with Wideband Autonomou Cognitive Radio Stephen Machuzak, Student Member, IEEE, and Sudharman K. Jayaweera, Senior Member, IEEE Communication and Information Science Laboratory (CISL) Department of Electrical and Computer Engineering Univerity of New Mexico Albuquerque, NM 87131-0001, USA Email: {machuzak9, jayaweera}@unm.edu Abtract Thi paper preent a deign and an implementation of a wideband autonomou cognitive radio (WACR) for antijamming. The propoed anti-jamming cheme i aimed at evading a jammer that weep acro the whole wideband pectrum range in which the WACR i expected to operate. The WACR make ue of it pectrum knowledge acquiition ability to detect and identify the location of the weeping jammer. Thi information and reinforcement learning i then ued to learn the optimal communication mode to avoid the jammer. In thi paper, we dicu a pecific reinforcement learning mechanim baed on Q-learning to uccefully learn uch an anti-jamming operation over a everal hundred mega-hz of wide pectrum in realtime. We decribe a cognitive anti-jamming communication protocol that elect a pectrum poition with enough contiguou idle pectrum uninterfered by both deliberate jammer and inadvertent interferer and tranmit till the jammer catche up to it. When the jammer interfere with the cognitive radio tranmiion, it witche to a new pectrum band that will lead to the longet poible uninterrupted tranmiion a learned through Q-learning. We preent reult of an implementation of the propoed WACR for cognitive anti-jamming and dicu it effectivene in learning the urrounding RF environment to avoid both deliberate jamming and unintentional interference. Index term Anti-jamming, q-learning, reinforcement learning, ub-band election problem, wideband autonomou cognitive radio, wideband pectrum knowledge acquiition, wideband pectrum ening. I. INTRODUCTION Wideband autonomou cognitive radio (WACR) are radio that have the ability to make their own operating deciion in repone to the perceived tate of the pectrum, network and radio itelf [1]. The key to uch autonomou operation i the radio ability to ene and comprehend it operating environment. In general, it i deired that the radio have the ability to operate over a wide frequency range making the problem of ening all frequencie of interet to the radio in real-time a challenging problem. However, auming that thi i achieved, uch WACR provide an excellent technological option to achieve cognitive communication deired in many application cenario. A ituation in which cognitive communication can be a great aet i when reliable communication i needed in the preence of unintentional interference and deliberate jammer. In thi paper, we preent a deign of a WACR architecture to achieve cognitive anti-jamming and interference avoidance. We preent a general approach that may be ued to can and ene a wide pectrum range in order to achieve real-time pectrum awarene. A cognitive anti-jamming and interference avoidance communication protocol that ue thi pectrum knowledge i then developed. There i a trong jutification for baing cognitive communication protocol on machine learning o that they can both be autonomou and reponive to dynamic channel and network condition. In thi paper, we employ reinforcement learning to aid our propoed antijamming and interference avoidance communication protocol. Reinforcement learning (RL) ha the advantage of facilitating unupervied learning of an optimal deciion-making policy under reaonable pectrum dynamic. There have been a few previou attempt at uing machine learning technique, in particular reinforcement learning, to achieve anti-jamming in cognitive radio network (CRN). For example, [] ha propoed a modified Q-learning technique for jammer avoidance in a CRN. Thi ON-policy ynchronou Q-learning algorithm wa hown to converge fater than the tandard Q-learning algorithm in learning the behavior of both a weeping jammer and an intelligent jammer. Two other reinforcement learning approache, namely SARSA and QV-Learning algorithm, were invetigated in [3] to develop an anti-jamming policy againt a mart jammer in a CRN. However, reinforcement learning ha found many other application in cognitive radio than being limited to antijamming operation [4]. In fact, there are many example of ue of reinforcement learning in dynamic pectrum haring (DSS) ytem. For intance, in [5] o-called econdary uer employed Q-learning to learn optimal tranmiion power in channel with unknown parameter. Similarly, in [6] minimax- Q learning wa ued by econdary uer in an anti-jamming tochatic game to learn the pectrum-efficient throughput optimal policy to avoid jammer. Reinforcement learning i, of coure, not the only machine learning tool that can be ueful for modeling and implementing anti-jamming cognitive communication. Two promiing alternative are the game theoretic learning and artificial neural network (ANN). For example, in [7] antijamming and jamming trategie were modeled in a gametheoretic framework allowing radio to learn good policie uing a variant of fictitiou play learning algorithm. In another tudy [8], the friend-or-foe detection technique wa ued to

detect intelligent maliciou uer, acting a jammer, in a CRN. Reinforcement learning technique can alo be ued in conjunction with game-theoretic model to help learn good policie. For example, Q-learning baed trategie are ued in [9] and [10] for anti-jamming and jamming game to find the optimal channel-acce trategie. The author in [9] have hown that Nah-Q and friend-or-foe Q-learning can be effective in aggreive jamming environment and in mobile ad-hoc network, repectively. In [10], the author preented a game-theoretic anti-jamming cheme (GTAS) that ued a modified Q-learning algorithm to evade jammer attack. Mot of the above referenced contribution, however, have only been limited to either analyi or imulation. In thi reearch, however, we have developed a comprehenive cognitive anti-jamming communication protocol and implemented on a hardware-in-the-loop (HITL) imulation of a WACR prototype. We how reult for a cognitive radio that operate over about 00MHz-wide pectrum in real-time in the preence of common wirele interferer a well a a deliberate jammer. Importantly, we demontrate that a imple reinforcement learning algorithm can indeed learn the behavior of the jammer to achieve effective cognitive anti-jamming and interference avoidance. The remainder of thi paper i organized a follow: Section II detail our propoed WACR architecture and the wideband pectrum knowledge acquiition framework. Section III dicue a cognitive communication protocol for antijamming and interference avoidance and it implementation uing a reinforcement learning algorithm. Section IV preent our hardware-in-the-loop WACR prototype implementation of anti-jamming and the reult oberved in the preence of both deliberate jammer and unintentional interference. Finally, the paper i concluded in Section V by drawing a few final concluion and dicuing poible further work. II. WIDEBAND SPECTRUM KNOWLEDGE ACQUISITION The mot unique apect of a cognitive radio i the ability to be aware of it RF environment (pectrum tate) [1]. In dynamic pectrum haring application, thi i achieved by what i called pectrum ening [1], [11]. In the cae of wideband autonomou cognitive radio, on the other hand, pectrum ening can be more involved than imply finding o-called pectrum white-pace [1]. Indeed, the potential of WACR lie in their ability to ene and fully comprehend the wide pectrum of interet to the radio. Such comprehenion normally include not jut finding active ignal, but alo determining the characteritic of thee ignal o that they can properly be identified. Hence, we define a wideband pectrum knowledge acquiition framework coniting of 3 tep a hown in Fig. 1 [1]. The firt tep in pectrum knowledge acquiition framework i the wideband pectrum canning. By definition, WACR are wideband radio that may operate over a large frequency range. However, due to hardware contraint [1], at any given time, it may be able to oberve and proce only a portion, called a ub-band, of it operating pectrum range of interet. To gain knowledge of the complete pectrum range, thu, a WACR Figure 1. Spectrum knowledge acquiition conit of a planning tage and a proceing tage [1]. need to follow an efficient algorithm to determine which ubband to be ened at any given time. Clearly, thi choice will depend on the performance objective of the radio. Wideband pectrum canning tep can, thu, cloely be coupled with the communication protocol itelf. In the econd tep of the pectrum knowledge acquiition proce, the WACR detect active ignal preent in the ened ub-band. For thi, our propoed deign ue Neyman-Pearon threholding of an etimated power pectrum of the ub-band ignal. Note that, thi i very different from pectrum ening in a DSS cognitive radio in which only a ingle channel i ened at a time and a particular type of primary ignal i to be detected. Intead, all active ignal preent in a ub-band i to be detected. Thi tep, thu, allow the WACR to extract carrier frequencie of detected active band but not necearily other pecific information about the ignal [1]. Thu, the wideband pectrum knowledge acquiition framework conit of a third tep of ignal claification and identification. In thi final tep, detected ignal are claified to identify their origin and, in particular, what ytem they may belong to. Often, claification i better performed on certain feature extracted from the detected ignal [1]. Figure. Block Diagram of the Cognitive Engine and it ignal proceing tak. Figure how a cognitive engine implementation of the above pectrum knowledge acquiition framework epecially detailing the tep aociated with the pectral activity detection tep. Firt, the noie floor of each of the ub-band i etimated. Thi i ued to compute the Neyman-Pearon threhold for pectral activity detection ubjected to a given fale-alarm probability. Next, an etimate of the power pectral

3 denity (PSD) of the ened ub-band ignal i computed. In the abence of any a priori knowledge on poible ignal in a ub-band, a poible pectrum etimator i the periodogram of the ened ignal, defined a: Ŝ y (F ) = 1 N 1 y[n] jπf n (1) N n=0 where y[n] i the time-domain ignal of the ened ub-band and N i the number of ignal ample [1]. The periodogram, however, can be very erratic and noiy even when a large number of ample, N, i ued. To reduce the effect of uch noiy fluctuation on pectral activity detection, in our approach we apply frequency-domain moothing to the periodogram etimate of the ub-band pectrum a hown below: T (Y) = 1 LN (L 1)/ l= (L 1)/ Y [k + l] () where L denote the length of the rectangular moothing window, Y denote the FFT of the ened ub-band ignal, k i the ample in the pectrum where the rectangular window i centered at and T (Y) i the moothed periodogram [1]. It i imperative to mooth the periodogram to reduce the poibility of noie cauing the PSD etimator to exceed the detection threhold while it hould not, and vice vera. The Neyman-Pearon threhold, i then, applied to the moothed periodogram to detect any active ignal in the ub-band. Figure 3. Periodogram etimate of the ub-band pectrum for a 40MHz-wide ub-band centered at.46ghz. Figure 4. Smoothed periodogram etimate of the ub-band pectrum, a given by (), for a 40MHz-wide ub-band centered at.46ghz. Figure 3 and 4 how actual real-time periodogram and moothed PSD etimator for a ytem that ue 40MHz wide ub-band. By threholding the moothed periodogram etimate (), the WACR determine the location and bandwidth of the active ignal. Thi information i then utilized by the radio reconfiguration region (ee Fig.) to determine the idle frequency band within the jut ened ub-band. Thee are next ued to determine whether there i enough idle bandwidth to atify the uer deired minimum idle bandwidth requirement. III. COGNITIVE ANTI-JAMMING COMMUNICATIONS The propoed cognitive anti-jamming communication protocol avoid both deliberate and unintentional interference by learning when to witch it tranmiion to a new ub-band and when to continue to tranmit in the current ub-band. Thi i called the ub-band election problem [1]. In thi paper, we develop a reinforcement learning baed deciion policy baed on which a WACR elect the ub-band for ening and tranmiion to meet a given uer performance criterion. Specifically, our performance objective i anti-jamming and interference avoidance. For effective ub-band election, the WACR need to be able to predict the ub-band that will mot likely have deired condition to meet the performance objective et by the uer [1]. Thi can effectively be achieved if we were to have a good predictive model for the tate dynamic of ub-band. A commonly ued, and a reaonable, model i to aume that the tate dynamic are Markov. A cognitive radio learn it environment by ening one ub-band at a time. Hence, thi i a deciion-making problem in a partially obervable environment leading to a Partially Obervable Markov Deciion Proce (POMDP). Although the POMDP model i elegant in it formulation, optimal policy computation for POMDP can be computationally too demanding except in the cae of mall-ize problem [1]. In thi work, we get around the computational complexity iue by developing a low-complexity reinforcement learning technique to learn an optimal policy for ub-band election for anti-jamming and interference avoidance. The WACR will elect a ub-band that ha a portion of the ub-band idle for tranmiion and ha not been interfered with, deliberately or unintentionally, for the longet amount of time. Note that, the type of communication will determine the minimum contiguou length of idle bandwidth a ub-band mut have for it to be a candidate for election. Once the deired idle bandwidth condition i violated in the current ub-band due an interferer or a jammer, the WACR will elect another ubband from among all available ub-band. Baed on the aumed communication objective, in thi work we have developed a novel, and imple, definition for the tate of a ub-band. In particular, each ub-band can be in one of two poible tate: Either it contain a contiguou idle bandwidth of a required length (tate 1) or it doe not (tate 0). With thi tate definition, a WACR will have to elect a new ub-band if and when the tate of the current ub-band change to tate 0. For efficient operation with effective antijamming, of coure, the elected new ub-band mut have low

4 interference with high probability. When interference i due to a deliberate jammer, efficient election can be achieved if the WACR can learn the pattern of behavior of the jammer. Our propoal employ an autonomou learning algorithm to achieve thi. An approach to learn an effective ub-band deciion policy, a mentioned earlier in thi ection, i to ue reinforcement learning technique uch a Q-Learning. Q-Learning i utilized in thi application due to it low computational complexity. Moreover, it doe not require the knowledge of tranition probabilitie of the underlying Markov model. Eentially, Q-Learning i a reinforcement learning technique in which for each tate and action pair, what i called a Q- i computed. The Q- i a quantification of the merit of taking a particular action when in a given tate [4]. After each execution of an action, the WACR update the Q-table baed on a certain oberved reward. In our approach, we ue a reward function that depend on the amount of time it take the jammer or interferer to interfere with the WACR tranmiion once it ha witched to a new ub-band. Let u denote the Q- aociated with electing action a in tate by Q(, a). After each execution of an action, the WACR update the Q-table entrie a below, where 0 < α < 1 and 0 γ < 1 denote the learning rate and the dicount factor, repectively [1]: Q([n 1, a n 1 ]) (1 α)q([n 1], a n 1 ) + α[r n ([n 1], a n 1 ) + γ max ([n], a)]. (3) a Our Q-Learning baed ub-band election algorithm elect ub-band for ening and tranmiion baed on the Q-table. However, in RL literature, it i well known that a certain amount of exploration of tate-action pace i required for effective learning. Hence, the ub-band election policy i defined a: { arg max Q(, a) with probability 1 ɛ a = a A (4) U(A) with probability ɛ where U(A) denote the uniform ditribution over the action et and ɛ i the exploration rate (or the exploration probability). Note that, an exploration rate of ɛ implie that the learner randomly elect an action with probability ɛ (explore an action) and it elect the bet action, a implied by the learnt Q- table, with probability 1-ɛ (exploitation). The exploration rate need to be carefully elected o a to trike an acceptable balance between exploration and exploitation [1]. A high exploration rate may help the WACR to quickly undertand the environment but it could reduce the performance due to exceive exploring and not exploiting what it ha learned. In contrat, a low exploration rate could make the WACR take far more time to learn the environment and converge to the optimal olution, when that i indeed poible [1]. IV. SIMULATION RESULTS The hardware-in-the-loop etup i implemented on a Lab- VIEW program uing an NI-USRP oftware-defined radio. Signal proceing tak of the cognitive engine are performed by the LabVIEW program running on a laptop in real-time. Figure 5. The etup of the hardware and a general top-layer overview of the hardware-in-the-loop etup. Figure 5 how the general hardware-in-the-loop imulation etup. The hardware portion collect real-time data, and pae them to the cognitive engine for proceing. In addition, it alo tranmit the radio own ignal a intructed by the cognitive engine. Our WACR prototype operate over a pectrum range of 00MHz in real-time and can 40MHz-wide ub-band at a time. In thi cae, the Q-table i a 5x5 matrix. Specifically, there are 5 tate and 5 action: the row are the tate and the column are the action. Note that, the action i the ubband it elect for ening during the next time intant in an attempt to ecape the jammer. To demontrate our prototype ability to learn a good ubband election policy, our field tet ued a continuou weeping ignal acting a the jammer which weep the 00MHz-wide pectrum within a period of 35 econd. We teted our learning algorithm in two pectrum range: the GHz-.GHz band that uually contained unintentional outide interferer in addition to our weeping jammer ignal and the 3GHz-3.GHz band that wa motly free of additional unintentional interferer. The jammer weep thee frequency band from the lower to the higher frequency. Hence, in the abence of any other interference the optimal ub-band election policy to avoid the jammer i intuitive: The WACR hould cyclically hift to the ub-band that i adjacent to the current ub-band from the lower frequency ide. For example, if the WACR i currently ening ub-band 5, it hould chooe ub-band 4 in order to avoid the jammer for the longet amount of time poible. Table I how thi intuitive pattern of the optimal ub-band election policy that the WACR need to learn in order to effectively avoid the weeping jammer (under the aumption that there are no other interferer except the weeping jammer). Reult from our field tet how that our WACR can indeed learn the above optimal ub-band election policy to avoid deliberate jamming. Table II and III how the Q-table learned by the WACR, while operating in the 3GHz-3.GHz band and the GHz-.GHz band, repectively. In thee experiment, uer defined minimum required bandwidth in a ub-band i 30MHz. Note that, the difference between Table II and III i that in Table II, the WACR operated in a frequency band that wa free of unintentional interference wherea in Table III the WACR operated in a band with unintentional interference.

5 Table I Q-TABLE WITH OPTIMAL POLICY ANTI-JAMMER AVOIDANCE PATTERN. 1 3 4 5 1 0 0 0 0 3 0 4 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 Table II LEARNED Q-TABLE IN THE 3GHZ TO 3.GHZ BAND 1 3 4 5 1 0.0461 0.0956 0.907 0.4676 4.6945 4.8770 0.0830 0.008 0.87 0.9495 3 0.834 4.688 0.168 0.097 0.88 4 0.37 0.7844 4.5411 0.0645 0.087 5 0.048 0.7756 0.7705 4.550 0.0851 Table III LEARNED Q-TABLE IN THE GHZ TO.GHZ BAND 1 3 4 5 1 0.0971 0.3677 0.4801 0.454 1.0584 1.5785 0.964 0.1780 0.3003 0.6007 3 0.4680 1.4561 0.0940 0.179 0.3079 4 0.333 0.704 1.4148 0.1881 0.1898 5 0.333 0.578 0.449 1.130 0.138 Clearly, thee Q-table how that our propoed reinforcement learning baed ub-band election algorithm can indeed learn the weeping jammer behavior and perform a an effective cognitive anti-jamming and interference avoidance protocol. The Q-table in Table II and III how that if the ytem were to exploit (chooe the action reulting in the greatet reward), it will indeed chooe the optimal ub-band that follow our intuition a previouly mentioned and a hown in Table I. Another obervation from thee reult i that our propoed learning cheme i relatively robut againt unintentional interference. For example, Table III how that depite the preence of both unintentional interference and the deliberate jammer, the WACR i ucceful at learning a good action election policy to avoid the jammer. 0 A poible future work i to ue an expanded Q-table that can alo include tate in which unintentional interferer are preent alongide the jammer. Implementing a game-theory baed approach to defeating jammer interference can alo be an additional future goal. ACKNOWLEDGMENT Thi work wa funded in part by the Air Force Reearch Laboratory (AFRL), Space Vehicle Directorate, under grant FA9453-15-1-0314 and in part by the National Aeronautic and Space Adminitration (NASA) under the contract NNX15CC80P. REFERENCES [1] S. K. Jayaweera, Signal Proceing for Cognitive Radio. Hoboken, New Jerey: Wiley, 015. [] F. Slimeni, B. Scheer, Z. Chtourou, and V. L. Nir, Jamming mitigation in cognitive radio network uing a modified q-learning algorithm, in Military Communication and Information Sytem (ICMCIS), 015 International Conference on, May 015, pp. 1 7. [3] S. Singh and A. Trivedi, Anti-jamming in cognitive radio network uing reinforcement learning algorithm, in Wirele and Optical Communication Network (WOCN), 01 Ninth International Conference on, Sep. 01, pp. 1 5. [4] M. Bkainy, Y. Li, and S. K. Jayaweera, A urvey on machinelearning technique in cognitive radio, IEEE Communication Survey Tutorial, vol. 15, no. 3, pp. 1136 1159, Third 013. [5] T. Chen, J. Liu, L. Xiao, and L. Huang, Anti-jamming tranmiion with learning in heterogenou cognitive radio network, in Wirele Communication and Networking Conference Workhop (WCNCW), 015 IEEE, Mar. 015, pp. 93 98. [6] B. Wang, Y. Wu, K. J. R. Liu, and T. C. Clancy, An anti-jamming tochatic game for cognitive radio network, IEEE Journal on Selected Area in Communication, vol. 9, no. 4, pp. 877 889, Apr. 011. [7] K. Dabcevic, A. Betancourt, L. Marcenaro, and C. S. Regazzoni, A fictitiou play-baed game-theoretical approach to alleviating jamming attack for cognitive radio, in Acoutic, Speech and Signal Proceing (ICASSP), 014 IEEE International Conference on, May 014, pp. 8158 816. [8] S. R. Sabuj, M. Hamamura, and S. Kuwamura, Detection of intelligent maliciou uer in cognitive radio network by uing friend or foe (FoF) detection technique, in Telecommunication Network and Application Conference (ITNAC), 015 International, Nov. 015, pp. 155 160. [9] Y. Gwon, S. Datangoo, C. Foa, and H. T. Kung, Competing mobile network game: Embracing anti-jamming and jamming trategie with reinforcement learning, in Communication and Network Security (CNS), 013 IEEE Conference on, Oct. 013, pp. 8 36. [10] C. Chen, M. Song, C. Xin, and J. Backen, A game-theoretical antijamming cheme for cognitive radio network, IEEE Network, vol. 7, no. 3, pp. 7, May 013. [11] S. Haykin, Cognitive radio: Brain-empowered wirele communication, IEEE Journal on Selected Area in Communication, vol. 3, no., pp. 01 0, Feb. 005. V. CONCLUSION In thi paper, we have preented an anti-jamming wideband autonomou cognitive radio and demontrated it i indeed capable of evading both deliberate jammer and unintentional interference. In addition, we have alo demontrated that reinforcement learning can be an effective approach for a WACR to learn the optimal communication mode to avoid a deliberate jammer. Reult obtained from an HITL imulation howed that it wa able to uccefully infer the jamming pattern and learn the optimal ub-band election policy for jammer avoidance.