Multiagent Jamming-Resilient Control Channel Game for Cognitive Radio Ad Hoc Networks

Similar documents
Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Calculation of the received voltage due to the radiation from multiple co-frequency sources

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game

Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications over Cognitive Radio Networks

Digital Transmission

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

Distributed Uplink Scheduling in EV-DO Rev. A Networks

Selective Sensing and Transmission for Multi-Channel Cognitive Radio Networks

Define Y = # of mobiles from M total mobiles that have an adequate link. Measure of average portion of mobiles allocated a link of adequate quality.

Adaptive Modulation for Multiple Antenna Channels

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

An Attack-Defense Game Theoretic Analysis of Multi-Band Wireless Covert Timing Networks

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Multiband Jamming Strategies with Minimum Rate Constraints

Improved Detection Performance of Cognitive Radio Networks in AWGN and Rayleigh Fading Environments

UNIT 11 TWO-PERSON ZERO-SUM GAMES WITH SADDLE POINT

Joint Adaptive Modulation and Power Allocation in Cognitive Radio Networks

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

TODAY S wireless networks are characterized as a static

A Fuzzy-based Routing Strategy for Multihop Cognitive Radio Networks

Joint Power Control and Scheduling for Two-Cell Energy Efficient Broadcasting with Network Coding

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Characterization and Analysis of Multi-Hop Wireless MIMO Network Throughput

Malicious User Detection in Spectrum Sensing for WRAN Using Different Outliers Detection Techniques

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

Learning Ensembles of Convolutional Neural Networks

Uplink User Selection Scheme for Multiuser MIMO Systems in a Multicell Environment

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Enhancing the Reliability of Cognitive Radio Networks via Channel Assignment: Risk Analysis and Redundancy Allocation

Medium Access Control for Multi-Channel Parallel Transmission in Cognitive Radio Networks

Distributed Channel Allocation Algorithm with Power Control

Throughput Maximization by Adaptive Threshold Adjustment for AMC Systems

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 13, NO. 12, DECEMBER

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

Power Minimization Under Constant Throughput Constraint in Wireless Networks with Beamforming

WIRELESS spectrum is currently regulated by governmental

ANNUAL OF NAVIGATION 11/2006

Full-duplex Relaying for D2D Communication in mmwave based 5G Networks

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

An Improved Method for GPS-based Network Position Location in Forests 1

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

A Novel Optimization of the Distance Source Routing (DSR) Protocol for the Mobile Ad Hoc Networks (MANET)

Opportunistic Beamforming for Finite Horizon Multicast

Optimizing a System of Threshold-based Sensors with Application to Biosurveillance

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

Topology Control for C-RAN Architecture Based on Complex Network

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

Distributed Interference Alignment in Cognitive Radio Networks

Study of Downlink Radio Resource Allocation Scheme with Interference Coordination in LTE A Network

Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications

A Return and Risk Model for Efficient Spectrum Sharing in Cognitive Radio Networks

Understanding the Spike Algorithm

Impact of Secondary MAC Cooperation on Spectrum Sharing in Cognitive Radio Networks

A Predictive QoS Control Strategy for Wireless Sensor Networks

antenna antenna (4.139)

King s Research Portal

RESOURCE CONTROL FOR HYBRID CODE AND TIME DIVISION SCHEDULING

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Optimal Transmission Scheduling of Cooperative Communications with A Full-duplex Relay

Energy Efficient Adaptive Modulation in Wireless Cognitive Radio Ad Hoc Networks

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Secure Transmission of Sensitive data using multiple channels

On Interference Alignment for Multi-hop MIMO Networks

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

Modelling Service Time Distribution in Cellular Networks Using Phase-Type Service Distributions

Power Allocation in Wireless Relay Networks: A Geometric Programming-Based Approach

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

Distributed Topology Control of Dynamic Networks

Cooperative Sensing Decision Rules over Imperfect Reporting Channels Nian Xia1, a, Chu-Sing Yang1, b

Tile Values of Information in Some Nonzero Sum Games

Energy Efficiency Analysis of a Multichannel Wireless Access Protocol

760 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 30, NO. 4, MAY 2012

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

Dynamic Resource Allocation Algorithm of UAS by Network Environment and Data Requirement

The Synthesis of Dependable Communication Networks for Automotive Systems

Research Article Dynamical Spectrum Sharing and Medium Access Control for Heterogeneous Cognitive Radio Networks

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Performance Evaluation of QoS Parameters in Dynamic Spectrum Sharing for Heterogeneous Wireless Communication Networks

Low Complexity Duty Cycle Control with Joint Delay and Energy Efficiency for Beacon-enabled IEEE Wireless Sensor Networks

A Novel DSA-Driven MAC Protocol for Cognitive Radio Networks

Review: Our Approach 2. CSC310 Information Theory

Dynamic Lightpath Protection in WDM Mesh Networks under Wavelength Continuity Constraint

Approximating User Distributions in WCDMA Networks Using 2-D Gaussian

Unicast Barrage Relay Networks: Outage Analysis and Optimization

The Effect Of Phase-Shifting Transformer On Total Consumers Payments

Jointly optimal transmission and probing strategies for multichannel wireless systems

Decision Analysis of Dynamic Spectrum Access Rules

Distributed Fault Detection of Wireless Sensor Networks

MTBF PREDICTION REPORT

Cooperative Multicast Scheduling Scheme for IPTV Service over IEEE Networks

Joint Channel Assignment and Opportunistic Routing for Maximizing Throughput in Cognitive Radio Networks

Secure Power Scheduling Auction for Smart Grids Using Homomorphic Encryption

Priority based Dynamic Multiple Robot Path Planning

Distributed Energy Efficient Spectrum Access in Cognitive Radio Wireless Ad Hoc Networks

Uncertainty in measurements of power and energy on power networks

Transcription:

Multagent Jammng-Reslent Control Channel Game for Cogntve Rado Ad Hoc Networks Brandon F. Lo and Ian F. Akyldz Broadband Wreless Networkng Laboratory, School of Electrcal and Computer Engneerng Georga Insttute of Technology, Atlanta, GA 3332 Emal: {brandon.lo,an}@ece.gatech.edu Abstract Control channel jammng s a severe securty problem n wreless networks. Ths results from the fact that the attackers can effectvely launch the denal of servce attacks by jammng the control channels. Tradtonal approaches to combatng ths problem such as channel hoppng sequences may not be the secure soluton aganst ntellgent attackers because the relablty of control channels n cogntve rado ad hoc networks cannot be guaranteed. In ths paper, we ntroduce a jammngreslent control channel (JRCC) game to model the nteractons among cogntve rado users and the attacker under the mpact of prmary user actvty. We propose the JRCC algorthm that enables user cooperaton to facltate control channel allocatons and adapts to prmary user actvty wth varable learnng rates usng the Wn-or-Learn-Fast prncple for jammng-reslence n hostle envronments. It s shown that the optmal strateges converge to a Nash equlbrum or the expected rewards of the strateges converge to that of a Nash equlbrum. The results also show that the JRCC algorthm effectvely combats jammng under the mpact of prmary user actvty and sensng errors. Moreover, the control channel allocaton polcy can be mproved by enhancng transmsson and sensng capabltes. The proposed algorthm s scalable and can be appled to multple users. I. INTRODUCTION Common control channel (CCC) n cogntve rado (CR) networks [8] s the spectrum resource specfcally allocated for control message exchange among CR users to facltate network operatons. In CR ad hoc networks (CRAHNs) [] where no centralzed control entty such as base staton (BS) exsts, CR users cooperate wth each other for all spectrum management functons such as cooperatve spectrum sensng [2], and thus relyng even more on CCC for message exchange and normal operatons. As a result, the relablty of CCC allocaton s essental n CRAHNs. However, when a dedcated CCC allocated out of the lcensed bands s not feasble, CCC must be dynamcally allocated n lcensed bands. In ths case, the n-band CCC wll be nterrupted by prmary user (PU) actvty and needs to be effcently reallocated and recovered when the exstng CCC s occuped by the PU [7]. Dynamc CCC allocatons n lcensed bands are further complcated by jammng attacks f securty ssues are consdered. Jammng attacks are launched by malcous users to delberately dsrupt the communcatons of CR users, resultng n denal of servce (DoS) n CR networks. Although jammng attacks can occur n any type of channels, data or control, t s reported n [5] that jammng the broadcast channel (BCCH) of the GSM system s several order of magntude more effectve than targetng at all channels. For ths reason, Ths work was supported by the U.S. Natonal Scence Foundaton (NSF) under Grant No. ECCS-993. Fg.. Prmary Network JRCC Game Ch Ch2 Ch3 CR Jammng Regon CR3 Control Channel??? Ch? CRAdHoc Network Attacker CR2 Jammng-reslent control channel game. any ntellgent attacker may prefer control jammng attack than other jammng methods due to ts effectveness of resultng n DoS. Thus, as n any wreless networks, control channel jammng s a severe securty ssue n CRAHNs. The nteractons between CR users and attackers are commonly modeled as a stochastc zero-sum game [6], [], [] snce CR users and jammers generally have opposte goals. In these approaches, PU actvtes govern states of the game and state transtons, and sensng errors are generally gnored for smplcty. In [6], the Nash equlbrum strategy s obtaned for the one-stage game, whle the optmal attackng strategy s obtaned for the mult-stage case. The latter s acheved by fxng CR user s strategy and convertng the problem to the framework of the sngle-player partally observable Markov decson process (POMDP). [] shows that CR users can combat jammng by ncreasng the number of unoccuped channels that can be observed. However, ths capablty s lmted by PU actvty and channel avalablty. In [], mnmax-q learnng s used by CR user to fnd the optmal ant-jammng channel selecton polcy. Although the CR user s actons consst of separate selectons of control and data channels, the attacker n ths work, lke the one n [6] and [], does not exclusvely target at jammng control channels. In ths paper, we model the nteractons among CR users, and the attacker under the mpact of PU actvtes as a stochastc general-sum game, called jammng-reslent control channel (JRCC) game. Fg. llustrates the JRCC game wth the PUs, three CR users, and the attacker. The objectve of the game s to fnd the optmal control channel allocaton strategy for CR users to combat jammng attacks by usng multagent renforcement learnng (MARL). The optmal control channel allocaton polcy s obtaned by enablng the communcatons among CR users to facltate CCC allocatons and the adaptaton to PU actvty to acheve the Nash equlbrum n the game. We demonstrate that the effectveness of ant-jammng CCC allocatons can be mproved by the cooperaton of CR users. By explotng the advantages of Polcy Hll-Clmbng 845

(PHC) and the Wn-or-Learn-Fast (WoLF) prncple [4], our proposed JRCC algorthm effectvely combats jammng under PU actvty and sensng errors, and outperforms the orgnal MARL algorthms. Our contrbuton can be summarzed as follows: We model the nteractons among CR users and the attacker under the mpact of PU actvtes as a stochastc general-sum game called JRCC game wth the consderaton of sensng errors and lmted observatons of other players actons and payoffs. We analyze the gradent dynamcs of the JRCC game by usng the N-dmensonal nonlnear dynamcal system wth the gradent ascent algorthm and show the convergence of the JRCC game. We propose the JRCC algorthm for CR users as optmal control channel strategy that utlzes CR user cooperaton wth the hll-clmbng algorthm n low PU actvty and exhbts reslence to jammng usng varable learnng rates n hgh PU actvty. The remander of ths paper s organzed as follows: Secton II dscusses the system model and assumptons. Secton III descrbes the dynamcs and the proposed algorthm of the JRCC game. Secton IV evaluates the performance by varous test scenaros, and Secton V concludes the paper. II. SYSTEM MODEL The system model conssts of a prmary network model, a CRAHN model, a jammng attack model whose nteractons are descrbed by the JRCC game. Prmary Network Model: The prmary network P conssts of N p PUs who may be actve or nactve on a set of N p lcensed channels, N p, avalable for opportunstc access by CR users. Each lcensed channel N p s occuped by one PU, P, whose actvty follows the two-state brth-death process wth the brth rate r b and the death rate r d. The departures and the arrvals of a PU on channel follow a Posson process wth exponentally dstrbuted nter-arrval tme. Thus, each channel has two states, PU actve (ON) state and PU nactve (OFF) state, wth transton probabltes: r b (OFF to ON) and r d (ON to OFF). We also assume that PU transmsson s tme-slotted. As a result, CR users need to perodcally sense lcensed channels accordng to the schedule of the prmary network. Snce the sensng operatons of CR users are subject to errors, CR users need to satsfy the detecton requrements n terms of probablty of false alarm P f and probablty of mss detecton P m to lmt the nterference wth PUs under a tolerable level. We also assume that the attacker needs to meet the detecton requrements. CR Ad Hoc Network Model: A group of K CR users, K, wthn the jammng regon of the attacker opportunstcally access N p lcensed channels. Due to hardware lmtatons, CR users can only sense or transmt on N s N p lcensed channels each tme. Dependng on the sensng results and channel avalablty, CR user k K selects a subset of channels, N k N p and N k = N k N s, as control channels and transmts the same control messages on those selected channels. However, not all N k channels are vald CCCs. Due to sensng errors and jammng attacks, these selected control channels may not be vald allocatons for successful control transmsson. In addton, a CCC must be commonly avalable to all CR users n the regon. Thus, vald CCC allocatons exst only when the selected channels are unoccuped by a PU, jammng-free, and common to CR users such that CR users can successfully exchange control messages on these channels. That s, the number of vald CCCs s U c = U c = N c J c P c where N c, J c and P c are the numbers of selected CCCs, jammed CCCs, and nterferng CCCs due to mss detecton, respectvely, and N c = N k N l, k, l K, k l. We assume that all control messages are encrypted and are unable to be decrypted by the eavesdroppng attacker durng the perod of the game. After rendezvous on these CCCs, the CR user par can use the n-band CCCs for transmttng data or negotatng an avalable channel for data transmsson. Jammng Attack Model: For jammng attacks, we assume that the attacker has smlar hardware capablty as CR users do and can sense and jam up to N s N p lcensed channels each tme. Accordng to the sensng results, the attacker selects N j channels to jam and transmts the nterference sgnal on those selected channels. Due to sensng errors, the attacker may select the PU-occuped channels to jam and cause the nterference wth PUs. Snce the objectve of the attacker s to dsrupt CR transmsson, we assume that the attacker wll make efforts to avod nterferng wth PUs to save ts energy and avod beng exposed to PUs unless t s caused by the sensng hardware lmtatons. Thus, the attacker appears to PUs as a CR user. Moreover, we assume that the attacker does not behave lke a PU by occupyng the channels and forcng CR users to use other channels because ths does not successfully jam control channels. We also assume that the attacker s unable to detect the control traffc and launch the jammng attack after the CCCs are establshed snce such attacks requre knowledge about CR users and the n-band CCCs are also used for data transmsson. For these reasons, we do not consder other types of securty attacks such as PU emulaton attacks and node capture attacks (Byzantne falures) n our model. Assume that the attacker selects a subset of channels, N j N p and N j = N j N s for jammng. The number of vald jammed control channels s then J c = J c = N j U j P j where U j and P j are the number of jammed non-ccc channels and PU-occuped channels caused by mss detecton, respectvely. For effectve control channel jammng, J c = U c. III. JAMMING-RESILIENT CONTROL CHANNEL GAME In ths secton, we ntroduce the JRCC game that models the nteractons among PUs, CR users, and the attacker. We analyze the game by usng the gradent dynamcs and then ntroduce the JRCC algorthm for fndng the optmal control channel allocaton strategy for CR users. A. States, Actons, Transton probabltes, and Rewards In the JRCC game, the prmary network P affects the states of the game wth PU actvty on a set of N p lcensed channels. For a set of N p lcensed channels, there are 2 N p states n the game. The state of the game at stage ndex n s denoted by s n = {s n,..., s n N p } where s n s the state of channel at stage ndex n. The state of channel s n s determned by PU P s actvty. That s, s n = f P occupes channel at stage n, and s n = otherwse. 846

The sets of actons are denoted by A k, k =,..., K for the attacker and K CR users, respectvely. The number of actons avalable to each player depends on the maxmum number of channels that can be sensed. For sensng up to N s channels, the number of actons s N Ak = N s ( Ns ) =. If PU actvty and jammng are not consdered and all actons are equally lkely, the probablty of selectng m CCCs s gven by )[ Nlm )] K Pr{N c = m} = Ns =m ( Np )( m [ Ns = j= ( Np j ( Np )] K () where N lm = mn(n p, N s m) s the lmtaton on other CR user s remanng channel selectons. The denomnator s the number of all jont acton combnatons among K CR users. To fnd the probablty of m selected CCCs, each CR user needs to select at least m channels. The frst bnomal coeffcent ( N p ) n the numerator s the number of choces of one CR user selectng out of total N p channels. The second bnomal coeffcent ( m) says that whch m out of the selected channels are common to all CR users. The bracket n the numerator s the number of other CR user s choces of selectng non-ccc channels from the remanng N p channels not selected by the frst CR user. For the attacker, the probablty of selectng m channels to jam s gven by Pr{N j = m} = ( N p ) [ Ns ( m / Np )] =. The probablty of at least one successful CCC allocaton s then Pr{U c >}= N s m= Pr{N c =m}pr{j c m N c =m}(2) where Pr{J c m N c =m}= m = ( m )[ Ns Ns = j= ( Np m)] j ( Np ) (3) The numerator n (3) s the combnatons of the attacker jammng out of m up to m CCCs plus other N s non- CCC channels selected from the remanng N p m channels. Snce the state transtons are governed by PU actvty and all channels are ndependent, the state transton probablty s gven by Pr{S n+ S n } = N p = Pr{sn+ = j s n = k}, j, k {, } where Pr{s n+ s n } s the probablty of state transtons from state s n to s n+ on channel dependng on the PU ON/OFF status of the gven state. CR users are rewarded for the selectons of un-jammed and PU-free CCCs. Thus, CR user k s mmedate reward for stage n s defned as: { rk n /(Nc J = c P c ) f U c = N c J c P c, (4) f U c = or N k = J c. The maxmum CR user s reward s unty when the selected channels are all PU-free CCCs and only one of them s not jammed. That s, N c P c = N k and N k > J c. The reward of the attacker s evaluated based on whether the CCCs of CR users are all jammed. As a result, the attacker J s mmedate reward for stage n s { rj n /( + (Nj J = c )) f U c = and N j >, (5) f U c > or N j =. Although CR users and the attacker generally have the opposte goal, t can be seen from (4) and (5) that, unlke the zero-sum game, the reward of the attacker s not the negatve of that of CR users n the JRCC game. B. Gradent Dynamcs Analyss In the JRCC game, the nteractons among all players can be modeled as an N-dmensonal non-lnear dynamcal system n whch the dynamcs of changes are the gradent of the jont strategy n R N. Smlar to [4], [9], we examne the dynamcs of the JRCC game usng the gradent ascent and show that the players strateges or expected payoffs wll converge. We focus on the dynamcs of an N-player JRCC game wth K CR users and one attacker (N = K + ). We assume perfect sensng and full observatons of PU states. In ths game, player k {,..., K} chooses acton a k, A k, =,..., N Ak, ndcatng that player k selects the -th subset of PU-free channels for CCC allocaton (k > ) or jammng (k = ). Let x k = {x k, [, ] : N Ak = x k, = } be player k s acton selecton strategy. Accordng to the strategy, the probablty of choosng acton a k, s x k,. In each stage, player k receves reward r k,j for the j-th jont acton (a,..., a K ) j selected by the jont strategy (x,..., x K ). Then the expected reward R k can be expressed as the functon of the jont strategy (x,..., x K ) and rewards r k,j, j =,..., K k= N A k. Snce the goal of each player s to fnd the optmal strategy to maxmze ther expected rewards, the gradent ascent algorthm provdes the mechansm for a player to acheve the optmal soluton by teratvely adjustng ts strategy wth a suffcently small step sze. In the gradent ascent usng varable learnng rates [4], the changes n expected rewards can be expressed as teratve strategy update rules as follows: x n+ k = x n k + α n δ n k R k (x n,..., x n K ) x n, k =,..., K (6) k are the step where δk n > are the learnng rates and αn δk n szes for updatng strategy x n k n stage n. R k/ x n k represent the changes n player k s expected reward n response to the changes n the strategy x k n the drecton of the gradent. They are obtaned by takng the partal dervatves of each player s expected reward wth respect to ts strategy. As a result, the dynamcs of the strategy changes can be formulated as an N- dmensonal constraned non-lnear affne dynamcal system wth dfferental equatons defned as ẋ = (Ax + b(x) + c) (7) subject to the unt-hypercube constrants: x k [, ] N A k, k =,..., K. (8) where x = [x... x K ] T, δ = [δ... δ K ] T, = δ T I N, A N N and c N NAk are matrces whose elements are the functons of rewards r k,j, and b(x) N NAk contans hgherorder products of x,..., x K. The constrants lmt the strateges nsde the unt hypercube because the strategy N-tuple are probablty dstrbutons. The system can be lnearzed at a fxed pont x f t has a soluton x [3]. If we let r = x x 2, b(x)/r approach faster than r as r. Combned wth the change of varable y = x x, we obtan the homogeneous lnear system: ẏ = Jy (9) where J = J F (x,...,x K ) and J F s the Jacoban matrx of X(x) = Ax + b(x) + c. The phase portrats of the non-lnear 847

system and ts lnearzed system are consdered qualtatvely equvalent n the neghborhood of x. Based on the analyss of gradent dynamcs, we conclude wth the followng theorem. Theorem (Convergence Theorem of JRCC Game): For the N-player terated general-sum JRCC game, f the players follow the gradent ascent algorthm wth varable learnng rates and a suffcently small step sze, the strategy N-tuple (x,..., x N ) wll converge to a Nash equlbrum or the expected rewards of the players wll converge to the expected rewards of a Nash equlbrum n the lmt. Proof: We examne the coeffcent matrx J of the lnear dynamcal system (9) wth the constrants (8), and show that the strategy wll ether converge to the fxed ponts of the system nsde the unt hypercube or the expected rewards of the strategy wll converge to that of a Nash pont on the boundary of the hypercube. Snce the varable learnng rates n have no effect on the drecton of the gradent, we focus on the egenanalyss of J n the followng two cases. ) J s sngular: In ths case, the system s neutrally stable and the trajectores n the phase portrat exhbt perodc patterns and the strategy N-tuple are perodc functons of tme. Snce ths perodcty n the strategy can be predctable and s not desred by ether CR users or the attacker n the JRCC game, CR users and the attacker wll enforce the system to stay away from neutrally stable states n order to make ther strateges unpredctable. 2) J s nonsngular: In ths case, J s nvertble and all the egenvalues of J have nonzero real part. The system has hyperbolc fxed ponts: the phase portrats of the nonlnear system and ts lnearzaton are qualtatvely equvalent n the neghborhood of the fx ponts. Let n u and n s be the number of egenvalues wth postve or negatve real part, respectvely. These egenvalues are assocated wth the correspondng unstable egenspaces V u R n u and stable egenspaces V s R ns of e Jt, respectvely. Trajectores n the phase portrat are movng away from the fxed pont n V u and approachng the fxed pont n V s as t ncreases. Snce n u +n s = N, we have the followng subcases: n u =,..., N. For n u =, the fxed pont s an attractng node and the strategy converges to ths Nash pont. For n u > and n u < N, trajectores are saddle ponts pontng nwards wth a focus n V s and outwards along V u. For n u = N, the fxed pont s an N-dmensonal star node pontng outwards. Due to the constrants (8), ponts on the trajectores away from the fxed pont wll ntally reach a pont on the boundary of the unt hypercube. Wthout loss of generalty, we assume that the pont s on one of the n-faces, n N. If the projecton of the gradent s zero at that pont, the trajectory wll stay on the pont. It s a Nash pont of the game snce no sngle user can mprove ts payoff by changng the strategy unlaterally. If the projected gradent s nonzero, the trajectory moves toward one of the (n )-faces of the hypercube n the drecton dependng on the sgn of the projected gradent and reaches a pont on the (n )-faces. The process wll stop at any pont where the projected gradent s zero or contnue to move toward lower dmensonal faces untl the trajectory reaches one of the vertces of the hypercube (n = ). Thus, (x,..., x N ) converges to a Nash equlbrum or ts expected rewards converge to the expected rewards of a Nash pont. C. JRCC Algorthm The gradent ascent algorthm n Secton III-B requres the knowledge of rewards for all combnatons of jont actons and the dstrbutons of other players actons avalable to each player. However, obtanng such knowledge n the JRCC game s nfeasble. Due to the lmtaton of sensng capablty, the actons of the players are only partally observable by other players. As a result, not all rewards can be obtaned for all jont actons. More mportantly, CR users and the attacker wll not reveal ther own acton selecton strategy. For these reasons, we propose the JRCC algorthm capable of selectng actons based on lmted observatons, updatng strategy smlar to gradent ascent, and obtanng the best response for each CR user ndvdually. The JRCC algorthm enables the cooperaton between CR users wth low control message overhead to facltate CCC allocatons, and adapts to PU and jammng actvty by usng the varable learnng rates based on the wn-or-learn-fast (WoLF) prncple [4] n extremely hostle envronment. When PU actvty s low, the JRCC algorthm behaves lke a ratonal hll-clmbng algorthm that converges to a greedy strategy to maxmze the payoffs. The performance s further mproved by the cooperaton and the exchange of a few parameters between CR users on the establshed CCCs snce ther strateges for CCC selectons become smlar. When PU actvty s hgh, the avalable CCCs under jammng attacks are very lmted, whch makes the cooperaton less effectve. In ths case, the WoLF prncple can adjust the learnng rates such that the players learn slowly to delay the strategy change of the opponent ( wnnng ) or learn fast when they are outperformed by the opponent ( losng ). The JRCC algorthm s lsted n Algorthm. In each stage, each CR user selects an acton that maps to a set of selected channels as CCCs for transmsson, and obtans ts own reward by observng the condtons of selected CCCs. (lnes 3-5). For cooperaton, each CR user broadcasts the control message wth the parameters recorded n prevous stage, and updates ts strategy wth the parameters receved from neghbors (lnes 6- ). After the PU changes the state of the game, CR users observe the next state s by sensng the channels, and update ther Q values for current state s and acton a (lnes -2). By selectng the proper learnng rate δ (lnes 3-7), CR users update ther own strategy (lne 8). The value of δ s set to the maxmum for greedy strategy and a varable value from the WoLF prncple. The parameters s, ã, and δ for the current greedy strategy are recorded for broadcast n the next stage (lne 9). For PHC strategy updates, the probablty of the best acton s ncreased whle the probabltes of other actons are evenly decreased (lnes 22-3). For varable learnng rates, the slow learnng rate δ w s selected for the wnnng case f the average Q value of the best acton a based on current polcy π s larger than that based on average polcy π, and the fast learnng rate δ l s selected otherwse (lnes 3-39). IV. PERFORMANCE EVALUATION In ths secton, we evaluate the performance of the proposed algorthm n the JRCC game. We show that both ncreasng the transmsson capablty of CR users and enablng the cooperaton between CR users can mprove the performance 848

Algorthm : JRCC for CR User K : Intalze: α, γ, ϵ, δ (, ], Q(s, a), π(s, a) A 2: for each stage n do 3: Select a A n state s per π(s) wth w.p. ϵ 4: Transmt on channels: {Ch : a N } 5: Observe U c, J c, P c, P j and calculate reward r 6: f (U c > and ã ) then 7: BroadcastToNeghbors( s, ã, δ ) 8: ReceveFromNeghbors( s, ã m, δ m, m K, m ) 9: StrategyUpdate(π( s, a), ã m, δ m ) : end f : 2: Observe next state s SensngChannels(N s, ) Q(s, a ) ( α)q(s, a ) + α [ r + γ max b Q(s, b) ] 3: f r r th then 4: δ = δ max 5: else 6: δ = WoLF(C(s), π(s, a), π(s, a), Q(s, a)) 7: end f 8: StrategyUpdate(π(s, a), a = arg max b Q(s, b), δ ) 9: f (U c > ) then s s, ã a, δ δ end f 2: UpdateParameters(α, γ, δ ), s s 2: end for 22: procedure StrategyUpdate(π(s, a), a, δ) 23: δ sa = mn ( ) δ π(s, a), A 24: f a a then 25: sa = δ sa 26: else 27: sa = a a δ sa 28: end f 29: π(s, a) π(s, a) + sa 3: end procedure 3: procedure WoLF(C(s), π(s, a), π(s, a), Q(s, a)) 32: C(s) C(s) + 33: π(s, a) π(s, a) + C(s) ( π(s, a) π(s, a) ), a A 34: f a π(s, a )Q(s, a ) > a π(s, a )Q(s, a ) then 35: δ = δ w 36: else 37: δ = δ l 38: end f 39: end procedure of combatng the attacker. We also show that the JRCC algorthm effectvely combats jammng under the mpact of PU actvtes and sensng errors. In the test scenaros, JRCC s compared to PHC and WoLF-PHC algorthms [4]. PHC s a greedy algorthm that mproves the polcy by selectng actons accordng to maxmum Q values. WoLF-PHC s based on PHC wth varable learnng rates determned by the WoLF prncple. In the smulaton envronment, we set N = 3, N p = 6, and N s = 3. For renforcement learnng parameters, we set α n = /(+n/5), δ n w = /(+n/) where n s step/stage ndex, δ l = 4δ w, γ =, ϵ =, δ max =, and r th = unless otherwse specfed. A. Convergence of JRCC Game Fg. 2 plots the expected rewards of CR users and the attacker for exemplary runs when PUs are not present. The group on the top s CR users rewards whle the bottom group s the attacker s. The fgure clearly shows the convergence of JRCC game for CR users and the attacker. In ths case, the convergence s faster than the runs wth state changes. However, the expected rewards from the runs wth state changes exhbt smlar convergence behavor. Ths shows that Fg. 2. Expected Rewards.2 2 3 4 5 6 7 8 9 Stages Convergence of the JRCC algorthm n JRCC game..2 WoLF 2 3 4 5 N s Fg. 3. Expected rewards versus transmsson capablty N s. the strategy of the players converges to a Nash equlbrum or the rewards converge to the reward of a Nash pont. B. Transmsson Capablty Owng to power constrants or hardware lmtaton, CR users and the attacker are lmted to transmttng on a maxmum number of channels N s N p smultaneously. To save energy, CR users may select a smaller number of channels N k N s as control channels at the hgher rsk of beng all jammed by the attacker. Smlarly, the attacker may select N j N s channels for jammng wth potental loss of jammng performance. Hence, transmsson capablty has the effect on the performance of CCC allocaton or jammng strategy of the players. For farness, we assume that CR users and the attacker have the same transmsson capablty. Fg. 3 shows the expected payoffs of PHC, WoLF-PHC, and JRCC algorthms for dfferent number of N s gven no PU actvty and N p = 6. As N s ncreases, the expected payoffs of JRCC CR users ncrease monotoncally. The performance gan of JRCC over PHC s manly obtaned from the cooperaton of CR users. The attacker s payoffs drop as N s ncreases from to 3 and slghtly ncrease as N s ncreases to 5. Note that the ncreases n CR users payoffs are monotoncally decreasng as N s vares from to 5. Ths s because the attacker s transmsson capablty s also ncreased. Ths shows that transmttng on all channels s not necessarly the best strategy for CR users f the attacker has the same capablty. C. Impact of the PU Actvty PU actvty s one of the major mpactng factors of JRCC performance snce the avalable channels for CCC allocatons may be sgnfcantly reduced. Fg. 4(a) shows the expected payoffs of JRCC, PHC, and WoLF-PHC versus the probablty of an actve PU, P on, n each channel. We assume that P on s the same for all PUs and both CR users and the attacker 849

WoLF WoLF WoLF.2.2.2.2 PU Actvty P on (a).2 PU Actvty P on (b).2 PU Actvty P on Fg. 4. Expected payoffs vs. PU actvty (P on) wth (a) perfect sensng, (b) false alarm P f =, and (c) mss detecton P m =. (c) perform perfect sensng wth no sensng errors. Snce there s a PU n each channel, the case of P on = s approxmately equvalent to the case that half of the channels are occuped by PUs on the average. Hence, the expected payoffs of CR users are reduced consderably whle the payoffs of the attacker ncrease to the maxmum from no PU actvty to P on =. For hgh PU actvty P on > where CCCs are less avalable for jammng, the payoffs of the attacker also decrease and approach zero as PU becomes mostly actve on all channels. PU actvty has the greatest mpact on PHC and the least on WoLF-PHC n terms of decreasng rate of the expected payoffs. The proposed JRCC mantans the hghest payoffs under low to medum PU actvty due to CR user cooperaton. For medum to hgh PU actvty where CCCs are less avalable for cooperaton, JRCC adopts the varable rates to combat jammng as WoLF-PHC. Hence, the performance of JRCC s comparable to that of WoLF-PHC n hgh PU actvty cases. Ths scenaro shows that JRCC adapts to PU actvty by combnng CR user cooperaton and varable learnng rates to maxmze the payoffs for jammng-reslent CCC allocatons. D. Effects of Sensng Errors In addton to PU actvty, sensng errors such as false alarm and mss detecton can have the major mpacts on the JRCC performance. In the false alarm cases, CR users are mstakenly forced to allocate CCCs n the smaller subset of avalable channels. Ths ncreases the probablty of two CR users selectng exclusve subsets of channels as CCCs. Hence, the effect of false alarms on CCC allocatons can be sgnfcant even f only one CR user experences the false alarm. Moreover, CR users may observe dfferent states due to false alarms and thus makng the cooperaton less effectve. As a result, false alarms, on top of exstng PU actvty, further reduce channel avalablty for CCC allocatons. Fg. 4(b) shows the expected rewards versus PU actvty wth P f = for CR users and the attacker. As expected, CR users are greatly affected by false alarms. The cooperatve gan n JRCC s also reduced compared to the perfect sensng scenaro. JRCC stll performs the best n low to medum PU actvty cases and approaches WoLF-PHC when PU actvty s hgh. Smlarly, the attacker s performance s affected by false alarms wth maxmum payoffs n medum PU actvty. Unlke false alarms, the effect of mss detecton on CCCs requres both CR users ncorrectly detectng the presence of the PU. Hence, the probablty of both CR users havng mss detecton s much smaller and the mpacts on CR users are less notceable. Fg. 4(c) shows the expected rewards versus PU actvty wth P m =. Compared to Fgs. 4(a) and 4(b), the performance of CR users and the attacker s slghtly affected. V. CONCLUSIONS In ths paper, we tackle the control channel jammng problem n CRAHNs by modelng the nteractons among CR users and the attacker under the mpact of PU actvtes as a stochastc general-sum game called JRCC game. We analyze the gradent ascent dynamcs of the game and show ts convergence. We also propose the JRCC algorthm for optmal CCC allocaton strategy by enablng CR user cooperaton and adaptng to PU actvty wth varable learnng rates. The results demonstrate that the JRCC algorthm effectvely combats jammng under the mpact of prmary user actvty and sensng errors. The CCC allocaton polcy can be mproved by enhancng transmsson and sensng capabltes. The proposed algorthm s scalable and can be appled to multple CR users. REFERENCES [] I. F. Akyldz, W.-Y. Lee, and K. R. Chowdhury, CRAHNs: Cogntve rado ad hoc networks, Ad Hoc Networks, vol. 7, no. 5, pp. 8 836, 29. [2] I. F. Akyldz, B. F. Lo, and R. Balakrshnan, Cooperatve spectrum sensng n cogntve rado networks: A survey, Physcal Communcaton, vol. 4, no., pp. 4 62, Mar. 2. [3] D. K. Arrowsmth and C. M. Place, Dynamcal Systems. London, UK: Chapman & Hall, 992. [4] M. Bowlng and M. Veloso, Multagent learnng usng a varable learnng rate, Artfcal Intellgence, vol. 36, pp. 25 25, 22. [5] A. Chan, X. Lu, G. Noubr, and B. Thapa, Broadcast control channel jammng: Reslence and dentfcaton of trators, n Proc. IEEE ISIT, Jun. 27, pp. 2496 25. [6] H. L and Z. Han, Dogfght n spectrum: Jammng and ant-jammng n multchannel cogntve rado systems, n Proc. IEEE GLOBECOM, Dec. 29, pp. 6. [7] B. F. Lo, I. F. Akyldz, and A. M. Al-Dhelaan, Effcent recovery control channel desgn n cogntve rado ad hoc networks, IEEE Trans. Vehcular Technology, vol. 59, no. 9, pp. 453 4526, Nov. 2. [8] B. F. Lo, A survey on common control channel desgn for cogntve rado networks, Physcal Communcaton, vol. 4, no., pp. 26 39, Mar. 2. [9] S. Sngh, M. Kearns, and Y. Mansour, Nash convergence of gradent dynamcs n general-sum games, n Proc. 6th Conf. Uncertanty n Artfcal Intellgence, 2, pp. 54 548. [] B. Wang, Y. Wu, K. Lu, and T. Clancy, An ant-jammng stochastc game for cogntve rado networks, IEEE Journal on Selected Areas n Communcatons, vol. 29, no. 4, pp. 877 889, Apr. 2. [] Q. Zhu, H. L, Z. Han, and T. Basandar, A stochastc game model for jammng n mult-channel cogntve rado systems, n Proc. IEEE ICC, May 2, pp. 6. 85