Multiagent Reinforcement Learning Dynamic Spectrum Access in Cognitive Radios

Similar documents
where and are polynomials with real coefficients and of degrees m and n, respectively. Assume that and have no zero on axis.

HYBRID FUZZY PD CONTROL OF TEMPERATURE OF COLD STORAGE WITH PLC

Efficient Power Control for Broadcast in Wireless Communication Systems

Spectrum Sharing between Public Safety and Commercial Users in 4G-LTE

VLSI Implementation of Low Complexity MIMO Detection Algorithms

DESIGN AND PARAMETRIC EVALUATION OF RECTANGULAR MICROSTRIP PATCH ANTENNA FOR GSM APPLICATION

An Improved Implementation of Activity Based Costing Using Wireless Mesh Networks with MIMO Channels

Design of A Circularly Polarized E-shaped Patch Antenna with Enhanced Bandwidth for 2.4 GHz WLAN Applications

Optimal Design of Smart Mobile Terminal Antennas for Wireless Communication and Computing Systems

Design of an LLC Resonant Converter Using Genetic Algorithm

ABSTRACTT FFT FFT-' Proc. of SPIE Vol U-1

Modulation and Coding Classification for Adaptive Power Control in 5G Cognitive Communications

Experimental Investigation of Influence on Non-destructive Testing by Form of Eddy Current Sensor Probe

1 Performance and Cost

IEEE Broadband Wireless Access Working Group < Modifications to the Feedback Methodologies in UL Sounding

Regionalized Interference Alignment in Two-Tiered Cognitive Heterogeneous Networks

Optimal Strategies in Jamming Resistant Uncoordinated Frequency Hopping Systems. Bingwen Zhang

Distributed Spectrum Allocation via Local Bargaining

OPTIMUM MEDIUM ACCESS TECHNIQUE FOR NEXT GENERATION WIRELESS SYSTEMS

Channel Modelling ETIM10. Fading Statistical description of the wireless channel

Realistic Simulation of a Wireless Signal Propagation in an Urban Environment

Steve Alpern, Thomas Lidbetter, Alec Morton, and Katerina Papadaki Patrolling a pipeline

The Marginal Utility of Cooperation in Sensor Networks

Analytical Performance Evaluation of Mixed Services with Variable Data Rates for the Uplink of UMTS

Power Minimization in Uni-directional Relay Networks with Cognitive Radio Capabilities

Chamber Influence Estimation for Radiated Emission Testing in the Frequency Range of 1 GHz to 18 GHz

An Efficient Control Approach for DC-DC Buck-Boost Converter

Design and Implementation of 4 - QAM VLSI Architecture for OFDM Communication

Analysis of Occurrence of Digit 0 in Natural Numbers Less Than 10 n

PSO driven RBFNN for design of equilateral triangular microstrip patch antenna

Probabilistic Spectrum Assignment for QoS-constrained Cognitive Radios with Parallel Transmission Capability

Design and Characterization of Conformal Microstrip Antennas Integrated into 3D Orthogonal Woven Fabrics

Optic Cable Tracking and Positioning Method Based on Distributed Optical Fiber Vibration Sensing

Key Laboratory of Earthquake Engineering and Engineering Vibration, China Earthquake Administration, China

Cyclic Constellation Mapping Method for PAPR Reduction in OFDM system

The Experimental Study of Possibility for Radar Target Detection in FSR Using L1-Based Non-Cooperative Transmitter

Design of FIR Filter using Filter Response Masking Technique

Analysis and Implementation of LLC Burst Mode for Light Load Efficiency Improvement

Analysis of a Fractal Microstrip Patch Antenna

Low-Complexity Time-Domain SNR Estimation for OFDM Systems

Analysis and Design of a 1MHz LLC Resonant Converter with Coreless Transformer Driver

(2) The resonant inductor current i Lr can be defined as, II. PROPOSED CONVERTER

On Reducing Blocking Probability in Cooperative Ad-hoc Networks

Demosaicking using Adaptive Bilateral Filters

Study and Design of Dual Frequency High Gain and Conformal Microstrip Patch Antenna

Near-field Computation and. Uncertainty Estimation using Basic. Cylindrical-Spherical Formulae

Development of Corona Ozonizer Using High Voltage Controlling of Produce Ozone Gas for Cleaning in Cage

Available online at ScienceDirect. Procedia Engineering 100 (2015 )

WIRELESS SENSORS EMBEDDED IN CONCRETE

N2-1. The Voltage Source. V = ε ri. The Current Source

Noise Attenuation Due to Vegetation

Volume 1, Number 1, 2015 Pages 1-12 Jordan Journal of Electrical Engineering ISSN (Print): , ISSN (Online):

Fisher Information of Mine Collapse Hole Detection Based on Sensor Nodes Connectivity

Published in: International Journal of Material Forming. Document Version: Peer reviewed version

LLR Reliability Improvement for Multilayer Signals

Sliding Mode Control for Half-Wave Zero Current Switching Quasi-Resonant Buck Converter

NICKEL RELEASE REGULATIONS, EN 1811:2011 WHAT S NEW?

Wireless Communication (Subject Code: 7EC3)

PERFORMANCE OF TOA ESTIMATION TECHNIQUES IN INDOOR MULTIPATH CHANNELS

On Performance of SCH OFDMA CDM in Frequency Selective Indoor Environment

Reliability Model of Power Transformer with ONAN Cooling

A Coplanar Waveguide Fed Asymmetric Ground Frequency Reconfigurable Antenna

Performance Analysis of Z-Source Inverter Considering Inductor Resistance

Performance Evaluation of Maximum Ratio combining Scheme in WCDMA System for Different Modulations

Hexagonal Shaped Microstrip Patch Antenna for Satellite and Military Applications

Optimised Wireless Network Using Smart Mobile Terminal Antenna (SMTA) System

Comparison Between Known Propagation Models Using Least Squares Tuning Algorithm on 5.8 GHz in Amazon Region Cities

A New Buck-Boost DC/DC Converter of High Efficiency by Soft Switching Technique

GAMMA SHAPED MONOPOLE PATCH ANTENNA FOR TABLET PC

Optimal Eccentricity of a Low Permittivity Integrated Lens for a High-Gain Beam-Steering Antenna

Feasibility of a triple mode, low SAR material coated antenna for mobile handsets

Design of Microstrip Antenna for Wireless Local Area Network

June 2012 Volume 1, Issue 3 Article #07

STACK DECODING OF LINEAR BLOCK CODES FOR DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

ISSN: [Reddy & Rao* et al., 5(12): December, 2016] Impact Factor: 4.116

Heuristic Algorithm for Location-Allocation Problem Based on Wavelet Analysis in Integrated Logistics Distribution

Helical Antenna Performance in Wideband Communications

MIMO OTA Testing in Small Multi-Probe Anechoic Chamber Setups Llorente, Ines Carton; Fan, Wei; Pedersen, Gert F.

A Transmission Scheme for Continuous ARQ Protocols over Underwater Acoustic Channels

A New Method of VHF Antenna Gain Measurement Based on the Two-ray Interference Loss

Performance Analysis of EGC Combining over Correlated Nakagami-m Fading Channels

A Gain Measurement in the Liquid Based on Friis Transmission Formula in the Near-Field Region

Proceedings of Meetings on Acoustics

QoE-Guaranteed and Power-Efficient Network Operation for Cloud Radio Access Network with Power over Fiber

On Implementation Possibilities of High-Voltage IGBTs in Resonant Converters

Spatial Coding Techniques for Molecular MIMO

Proposal of Circuit Breaker Type Disconnector for Surge Protective Device

Higher moments method for generalized Pareto distribution in flood frequency analysis

A Digital Self-Sustained Phase Shift Modulation Control Strategy for Full-Bridge LLC Resonant Converter

Analysis of the Radiation Resistance and Gain of Full-Wave Dipole Antenna for Different Feeding Design

A multichannel Satellite Scheduling Algorithm

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Localization Algorithm for a mobile robot using igs

Spread Spectrum Codes Identification by Neural Networks

Pervasive and Mobile Computing. Collaborative jamming and collaborative defense in cognitive radio networks

Performance analysis of ARQ Go-Back-N protocol in fading mobile radio channels

Tri-frequency Microcosmic Fractal Dipole Antenna with PBG Structure Used for 2G, 3G and 4G Systems. Bin LIN 1,*

Generalized Release Planning for Product-Line Architectures

10! !. 3. Find the probability that a five-card poker hand (i.e. 5 cards out of a 52-card deck) will be:

THE rapid growth of Information and Communications

Transcription:

Sensos & Tansduces 204 by IFSA Publishing, S L http://wwwsensospotalcom Multiagent Reinfocement Leaning Dynamic Spectum Access in Cognitive Radios Wu Chun, 2 Yin Mingyong, 2 Ma Shaoliang, Jiang Hong School of ational Defense Technology, Southwest Univesity of Science and Technology, Mianyang 62000, Sichuan, China 2 Institute of Compute Application, China Academy of Engineeing Physics, Mianyang 62900, Sichuan, China Tel: 86-86089890, fax: 86-86089890 E-mail: soldie_wu@63com Received: 28 ovembe 203 /Accepted: 28 Januay 204 /Published: 28 Febuay 204 Abstact: A multiuse independent Q-leaning method which does not need infomation inteaction is poposed fo multiuse dynamic spectum accessing in cognitive adios The method adopts self-leaning paadigm, in which each CR use pefoms einfocement leaning only though obseving individual pefomance ewad without spending communication esouce on infomation inteaction with othes The ewad is defined suitably to pesent channel quality and channel conflict status The leaning stategy of sufficient exploation, pefeence fo good channel, and punishment fo channel conflict is designed to implement multiuse dynamic spectum accessing In two uses two channels scenaio, a fast leaning algoithm is poposed and the convegence to maximal whole ewad is poved The simulation esults show that, with the poposed method, the CR system can obtain convegence of ash equilibium with lage pobability and achieve geat pefomance of whole ewad Copyight 204 IFSA Publishing, S L Keywods: Cognitive adios, Multiagent einfocement leaning, Q-leaning, Dynamic spectum access Intoduction Unde the tend of infomation innovation in cuent wold economy and social development, the wieless communication technology has expeienced apid development Cognitive adio (CR) [] becomes a hot eseach topic in wieless communication domain owing to its advantages of dynamic spectum access and intelligent adaptation to envionment The capability of high intelligence is one of significant key chaacteistics of CR, and the leaning epesents CR intelligence mostly Thee ae on-line leaning and off-line leaning methods applying in CR geneally [2] In on-line leaning, the agent inteacts with the envionment, gets feedback ewad, and leans fom its own expeience The einfocement leaning is the epesentative on-line leaning method The centalized solution is commonly used in taditional wieless communication fo applying online leaning to solve the issues of esouce allocation J ie pesents a dynamic spectum allocation method with centalized Q-leaning in mobile communication systems [3] S Xegias makes use of the centalized E-FRTS (enhanced fame egisty tee schedule) to accomplish the schedule and allocation of multimedia taffic in IEEE 8026 mesh netwoks [4] On account of the autonomy and vaiety of CR uses and the potential heteogeneity of cognitive 70 Aticle numbe P_848

adio netwoks (CR), the decentalized leaning is moe suitable fo CR than the centalized leaning To maximize the global ewad of all uses, multi CR uses negotiate with each othe odinaily and that needs infomation exchange J E Suis applies coopeative game model in the distibuted spectum shaing and poposes a distibuted algoithm to achieve nea optimal allocation, and the use exchange infomation of actions and ewads with each othe in poposed algoithm [5] P Zhou studies applying Bush-Mostelle einfocement leaning to esolve powe contol issue in CR [6] Although CR uses don't exchange stategy and ewad infomation with each othe, they still get the total jamming intensity (global ewad) fom the pimay use The infomation exchange between CR use and CR use (o pimay use) need occupancy a cetain amount of communication esouce Moeove, too fequent inteactions may cause the oveload of the communication To ovecome this shotcoming, an altenative method is self-leaing (independent leaning), with which the use only leans and acts based on itself ewad, not exchanging infomation with each othe Cuently, the eseaches on selfleaning in CR ae scace in liteatues This pape applies the epeated game to model the issue of multi uses competing multi channels, and poposes a multiagent einfocement leaning method: multiuse independent Q-leaning method with which the CR use implement self-leaning to maximize global ewad The simulations validate the effectiveness of poposed method 2 Stochastic Spectum Access Model The issue that an M CR uses (SUs, Second Uses) access channel not occupied by PUs (Pimay Uses) is eseached (only M is consideed in this pape) Each CR use chooses channel independently accoding to itself tactics epeatedly and aims at achieving the maximal total ewad of all uses The use does not exchange infomation with othes in the whole leaning and channel selecting pocess The epeated game [7] is used to model the pocess of multi CR uses competing channels At each stage game, M uses choose espective channel (in the whole channels), and the ewad of anyone is detemined by the combined stategies of all uses This stage game is pesented by a matix game [8] ( ) ( ) (,,, M M M A A,,, ), whee M is the numbe of uses in game, A is actions assemble of use m and it includes actions, ie choosing channel nn (,2, ), A is combined action ( M ) space A A A, is ewad function of use m When use m chooses channel n and it conflicts with othe use's choosing, the ewad ) ( n ) equals to zeo, ie ) 0 The ewad ) ( n ) is detemined by channel gain with no channel conflict, b The ewad matix of a game that two uses compete two channels is shown as fomula and R R b b 0 b 0 0 b 0 The item at line i, column, j in matix R denotes the ewad of use m when use chooses channel i and use 2 chooses channel j 3 Multiuse Independent Q-leaning The goal of the epeated game fo accessing channels is to maximize the ewad of the stage game g t M m t afte executing the stage game fo many times Multiagent einfocement leaning (MARL) [9] is effective method to esolve the game In most cuent liteatues about multiuse game, the use need obseve the ewads and the stategies of othe uses In cognitive adio netwoks, such fequent infomation inteactions of the ewads and the stategies between CR uses will occupy a geat quantity of communication esouce This pape poposes a multiuse independent Q-leaning (MIQ) method with which the CR uses don't equie any infomation exchange between each othe The MIQ algoithm in the epeated game model expects achieving two goals: one is the convegence to ash Equilibium and the othe is the total ewad of all uses each the maximal value o the close maximal value * Definition The stategies assemble ( m) m is a ash Equilibium if fo each use m, 2, M it has (, ) (, ),, (3) * * * m m m m m * * whee m is the stategy of use m, m is the combined stategies of the othe uses except use m, is the ewad of use m Among all possible combined stategies of evey use choosing diffeent channel, thee must exist one combined stategies which has bette o same ewad than othe combined stategies That combined stategies is a ash Equilibium point The ash Equilibium point exists appaently in above mentioned game, but the ash Equilibium point may be not unique Fo instance, all othogonal channel allocation stategies ae ash Equilibium points when M The basic Q-leaning method leans the optimal stategies in unknown envionment by using leaned knowledge and exploing new stategies with cetain pobability In the situation of undetemined ewads 7

and actions, the updating fomula of Q-value function is [0]: Qsa (, ) ( ) Qsa (, ) ( max Qs (, a)) (4) Some impovements on basic Q-leanig ae equied fo the poposed issue Each use leans independently in game pocess, and its ewad is affected by othe uses The ewad is uncetainty thus the slow updating of Q-value is a easonable manne In addition, owing to the status of uses does not tansfom duing epeated game, the new Q-value does not contain the contibutions of delay ewad With the above two impovements, as well as pope exploing policy and contol of leanig ate, a multiuse independent Q-leaning method is poposed The key to achieve the joint optimal solution by independent actions and leaning is designing suitable autonomous leaning policy and actions policy Two pinciples ae poposed and applied in the independent actions of each use: ) The use pefes choosing the channel with high gain; 2) The use avoids channel conflict between othes evetheless, the two pinciples may conflict occasionally o fequently The poposed MIQ algoithm executes iteation action ties unde the two pinciples and gets the final channel allocation The concete implementation of MIQ algoithm is as below: Step : Q-value table initializing The Q-value table of use m is initialized as a b () i Q n i n ( ),,2, (5) The initialized Q-value of use m is the aveage value of the ewads on diffeent channels Afte initialization, each item in Q-value table has the same aveage value, theefoe the use chooses any channel with same pobability in fist action A moe deep eason to initialize Q-value with aveage value is to conveniently ealize the subsequent updating pinciple of Q-value: big ewad makes Q-value incease slowly and small ewad makes Q-value decease slowly Step 2: Independent Q-leaning pocess Q-value update iteatively until eaching specified times of game a) Compute the pobabilities of choosing each channel based on Q-value table, execute the actions with the pobabilities in fomula (6) q ( Q ) P, n,2, q n ( Q ) (6) The bigge Q-value indicates the bette channel and meanwhile esults in the highe pobability of channel choosing In fomula (6), q is the pobability contolling facto The selection of actions inclines to use leaned knowledge with bigge q value and inclines to exploe all possible choices with smalle q value Due to thee is no infomation inteaction between uses, adequate exploation is significant and necessay in the initial stage of leaning Geneally, a small q is set at the vey beginning of leaning, along with the epeated leaning pocess the q incease gadually and slowly to impove the convegence of leaning b) Afte the actions, each use obseves itself ewad only b use m no conflict t, (7) 0 use m conflict On the one hand, the definition of ewad epesents the quality of channel, ie the ewad is the channel gain with no conflict On the othe hand, the ewad eflects the conflict status of the channels When action of use m conflicts with any othe, the ewad gets zeo and that esults in the decease of coesponding Q-value This can be deemed as punishment mechanism of channel conflict c) Update the Q-value table Qt ( t ) Qt t bt (8) whee t is the updating ate of t Q-value, ) t denotes the times use m chooses channel n duing the whole t epeated games, denotes the contolling facto of Q-value updating ate The updating ate of Q-value ) t educes gadually duing leaning pocess appaently and that contibutes to the convegence of leaning When the use chooses a channel with high gain (highe than aveage gain) on the condition of no conflict, the Q-value inceases, and highe gain lead to the moe incease of Q-value Meanwhile, if the use's selective channel conflict with othes, the Q-value updating in fomula (8) with cuent zeo ewad causes the decease of Q-value, and the degee of decease is much moe than the degee of incease obtained in unconflict situation Because of athe heavy punishment fo channel conflict, the use could seach high gain channel on the basis of no confliction The study on self-leaning (independent leaning) with which the agent leans and acts only by obseving itself ewad in MARL filed is vey few It is especially difficult to pove the convegence of self-leaning on the condition that the multi uses don t exchange infomation Bowling poposed a multiagent leaning method WOLF-PHC (win o lean fast policy hill-climbing) in which each use 72

leans using a vaiable leaning ate and it achieves the maximal ewads of all uses [] But the poof of the algoithm convegence is not povided and only an expeiment on 2 uses and 2 actions is done to evaluate the convegence popety This pape poposes a fast leaning algoithm fo 2 uses and 2 channels scenaio, and makes the poof of the convegence The concete implementation of the fast leaning algoithm is as below: Step : Phase of leaning channel ewad Each use exploes channels andomly thus get the ewad value (channel gain) without channel conflict Step 2: Phase of fast geedy channel choosing Each use chooses the channel of highest gain The allocation of channels and leaning pocess ae finished if no channel conflict occus When channel conflict occus, Step 3 executes subsequently Step 3: Phase of Q-leaning The lean pocess epeat fo specified times (a) Initialize the Q-value table Q Q Q Q 05 (9) (b) Choose action based on pobability policy and obseve the ewad The use chooses channel accoding to the pobabilities in fomula (6), obseves the status of conflict, and calculates the ewad When the use chooses the highe gain channel in the two candidates, the ewad is ) t - use m no conflict, (0) use m conflict ( ) ( ) ( ) whee m b m b m L, is an appopiate value between and When the use chooses the lowe gain channel in the two candidates, the ewad is ) t - (c) Update the Q-value table use m no conflict () use m conflict ( ) ( ) Q ( ) t n Qt n t n (2) Theoem The poposed fast leaning algoithm conveges to the optimal solution Poof: Suppose b b, b b o b b, b b It is clea that the algoithm convege to the optimal solution in the phase of fast geedy channel choosing Suppose b b, b b Assign the paamete L a sufficiently lage value thus make and ae small enough, and assign an appopiate value between and In the fist peiod of time of leaning, use pefoms 4n times of channel choosing On the condition that, and ae extemely small, the Q-value table updates as Q Q n n Q Q nn (3) Q Q n n Q Q nn If, thee is Q Q Q Q Duing the subsequent lean peiods, Q inceases continually meanwhile Q deceases continually Afte a while, the lean pocess ends and final solution is that each use chooses the channel with bigge Q-value, ie the use chooses channel and the use 2 chooses channel 2 This solution is the optimal solution owing to b b b b If, the algoithm conveges too Suppose b b, b b, the convegence can be pove in the same way 4 Simulation and Results The MIQ algoithm is simulated and evaluated mainly in thee aspects: the pobability of convegence to ash Equilibium, the pobability of convegence to the optimal solution and the nomalization pefomance of poposed algoithm In the scene of M CR uses selecting channels, the ewad of each use m choosing each channel n is initialized to unifomly distibuted andom numbes between 05 and, b ) 05 05* and() (4) Then each use caies out autonomic leaning with MIQ algoithm independently In policy updating pocedue in Step 2(a), the pobability contolling facto q adjusts dynamically The q equals 05 when selecting channel fo the fist time, and the q inceases vey gadually until eaching specified leaning times In Step 2(c), the contolling facto of Q-value updating ate is configued with The numbe of times of epeated game is 0000, and the simulation pocess is executed 00 times with divese andom ewads to obtain the aveage pefomance of poposed algoithm Fig and Fig 2 show pefomance of MIQ algoithm when the numbe of uses M equal to the numbe of channels The stategies of multi uses can convege to ash Equilibium at 00 % o nea 00 % (as shown in Fig ) When M 2, the total ewad of all uses convege to the maximal value with a pobability of 98 % The pobability of convegence to maximal ewad dops along with the incease of uses/channels numbe, and the 73

pobability dops to nea 70 % when M 8 Fig shows supeficially that the MIQ algoithm behaves geat pefomance unde the scene of vey few uses while the MIQ algoithm is not a quite good method unde the scene of many moe uses Moe deep evaluation of the algoithm fo aveage pefomance is descibed in Fig 2 When M 2 the nomalization pefomance (the atio of cuent total ewad to the maximal total ewad) 09999 and when M 3 the nomalization pefomance 09995 Along with incease of uses/channels numbe, the nomalization pefomance dops vey slowly When M 8, the nomalization pefomance maintains vey high value, ie 09978 The eason that the nomalization pefomance keeps high while the pobability of convegence to maximal ewad dops appaently is: the total ewad unde detemined paadigm of channel allocation is vey close to maximal ewad Theefoe, even if the MIQ algoithm can not achieve absolute optimal pefomance, it achieves quite good pefomance which is vey close to the absolute optimal pefomance algoithm conveges to maximal ewad fo 69 times and the nomalization pefomance got in the emaining 3 simulations is geate than 098 mostly, even the wost pefomance is geate than 097 On the condition that use numbe equals to channel numbe, the MIQ algoithm can each unconflicted othogonal channel allocation and the nomalization pefomance obtained by MIQ algoithm is appoximately 5 % highe than that of andom othogonal channel allocation method Fig 3 The nomalization pefomance distibution of 00 simulation samples fo MIQ algoithm ( M 8) Fig 4 and Fig 5 show pefomance of MIQ algoithm when the numbe of uses M is less than o equal to the numbe of channels Fig 4 shows the pobabilities of convegence to ash Equilibium and maximal total ewad by poposed algoithm with diffeent channel numbe ( M 2,3) Fig The pobabilities of convegence to ash Equilibium and the maximal ewad ( M ) Fig 4 The pobabilities of convegence to ash Equilibium and the maximal ewad ( M ) Fig 2 The nomalization pefomance of MIQ algoithm and andom othogonal allocation method ( M ) Fig 3 shows the 00 samples of nomalization pefomance obtained in simulations ( M 8) It can be seen that in total 00 simulations, the MIQ Along with incease of channel numbe, not only the pobability of convegence to maximal total ewad dops obviously but also the pobability of convegence to ash Equilibium dops similaly, and it is diffeent fom the case shown in Fig The channels conflict in MIQ leaning pocess would lead to the decease of Q-value and thus make final allocation of channels can avoid channels conflict 74

effectively The stategies assemble ealizing unconflicted allocation of channels is exactly ash Equilibium when M, so the pobability of convegence to ash Equilibium is quite high (as shown in Fig ) When M, the unconflicted allocation of channels does not necessaily satisfy ash Equilibium, and that is why the pobability convegence to ash Equilibium dops in Fig 4 Fig 5 shows the poposed MIQ algoithm has good pefomance too when M Fig 5 The nomalization pefomance of MIQ algoithm and andom othogonal allocation method ( M ) 6 Conclusions The independent leaning without infomation exchange between each node is an altenative on-line leaning method fo esouce allocation in cognitive adio netwoks This pape uses the epeated game modeling multiuse dynamic spectum accessing, and poposes a multiagent einfocement leaning method: multiuse independent Q-leaning method with which the CR use coodinates in choosing best highest gain channel and avoiding conflict between each othe Moeove, a fast leaning algoithm fo 2 uses and 2 channels case is pesented and poved that it convege to ash Equilibium The simulations show that use action can convege to ash Equilibium with high pobability and achieved total ewad is close to the maximal ewad with poposed MIQ algoithm Acknowledgements Poject suppoted by the ational atual Science Foundation of China (Gant o 6379005), and the ational Basic Reseach Pogam of China (Gant o 2009CB320403) Refeences [] J Mitola, J G Q Maguie, Cognitive adio: making softwae adios moe pesonal, IEEE Pesonal Communications, Vol 6, Issue 4, 999, pp 3-8 [2] C Wu, Y Li, K Yi, Reseach on GA-LSSVM offline leaning in cognitive adios, Jounal of Beijing Univesity of Posts and Telecommunications, Vol 35, Issue 2, 202, pp 90-93 [3] J ie, S Haykin, A Q-leaning-based dynamic channel assignment technique fo mobile communication systems, IEEE Tansactions on Vehicula Technology, Vol 48, Issue 5, 999, pp 676-687 [4] S Xegias, Passas, A K Salkintzis, Centalized esouce allocation fo multimedia taffic in IEEE 8026 mesh netwoks, Poceedings of the IEEE, Vol 96, Issue, 2008, pp 54-63 [5] J E Suis, L A Dasilva, H Zhu, et al, Coopeative game theoy fo distibuted spectum shaing, in Poceedings of the IEEE Intenational Confeence on Communications, Glasgow, Scotland, 2007, pp 5282-5287 [6] P Zhou, Y Chang, J A Copeland, Reinfocement leaning fo epeated powe contol game in cognitive adio netwoks, IEEE Jounal on Selected Aeas in Communications, Vol 30, Issue, 202, pp 54-69 [7] V D Schaa M, F Fu, Spectum access games and stategic leaning in cognitive adio netwoks fo delay-citical applications, Poceedings of the IEEE, Vol 97, Issue 4, 2009, pp 720-739 [8] C Yang, J Li, Mixed-stategy based discete powe contol appoach fo cognitive adios: A matix game-theoetic famewok, in Poceedings of the 2 nd Intenational Confeence on Futue Compute and Communication, Wuhan, China, 200, pp 3806-380 [9] L Busoniu, R Babuska, B D Schutte, A compehensive suvey of multiagent einfocement leaning, IEEE Tansactions on Systems, Man and Cybenetics Pat C: Applications and Reviews, Vol 38, Issue 2, 2008, pp 56-72 [0] Tom M Mitchell, Machine leaning, McGaw-Hill College, 2005 [] B Michael, V Manuela, Multiagent leaning using a vaiable leaning ate, Atificial Intelligence, Vol 36, Issue 2, 2002, pp 25-250 204 Copyight, Intenational Fequency Senso Association (IFSA) Publishing, S L All ights eseved (http://wwwsensospotalcom) 75