Improving Reinforcement Learning Algorithms for Dynamic Spectrum Allocation in Cognitive Sensor Networks Wireless Communications and Networking Conference Leonardo Faganello, Rafael Kunst, Cristiano Both, Lisandro Granville, Juergen Rochol `
Outline Introduction Motivation Background on Cognitive Radio System Model Spectrum Decision Algorithms Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+, Q-Noise, and Q-Noise+ Performance Evaluation Evaluation Methodology Results Conclusion 2
Introduction Cognitive Radio Networks Cognitive Radio Networks have been proposed to deal with the scarcity of frequency spectrum Possible applications include spectrum sharing among wireless sensors IEEE 802.15.4 Industrial Networks Health care sensors 3
Motivation Dynamic Spectrum Allocation Algorithms State-of-the-art for spectrum allocation mainly considers: Probabilistic resource allocation algorithms Genetic algorithms These algorithms have two frequent constraints: The model of users behavior is not properly defined Channel conditions are not considered in the allocation algorithms 4
Background on Cognitive Radio System Model: Industrial Scenario Spectrum sharing between sensors and wireless-enabled devices 5
Background on Cognitive Radio Spectrum Decision Algorithm: Q-Learning Reinforcement Learning algorithm Based on rewards Reward for each channel is based on successful transmissions Each channel has a Q-Value associated Rewards obtained with Q-Learning increase or decrease the channel s Q- Value 8
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 9
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 10
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 1 2... N-1 N Epoch 11
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 1 2... N-1 N Epoch 12
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 1 2... N-1 N Epoch Reward 13
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 1 2... N-1 N Epoch Reward r t = N D t D = 0, 667 14
Q-Learning Channel Q-Value 1 0.54 1 2 2 0.68...... N-1 0.32 Selects channel 2... N-1 N N 0.14 Epoch Reward Q t+1 a t = 1 α Q t + r t α New Q-Value r t = N D t D = 0, 667 15
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 1 2... N-1 N Epoch Reward Q t+1 a t = 1 α Q t + r t α New Q-Value Q t+1 a t = 1 0, 3 0, 68 + 0, 667. 0, 3 Q t+1 a t = 0, 6761 r t = N D t D = 0, 667 16
Q-Learning Channel Q-Value 1 0.54 2 0.68...... N-1 0.32 N 0.14 Selects channel 2 Update Q-Value For channel 2 1 2... N-1 N Epoch Reward Q t+1 a t = 1 α Q t + r t α New Q-Value Q t+1 a t = 1 0, 3 0, 68 + 0, 667. 0, 3 Q t+1 a t = 0, 6761 r t = N D t D = 0, 667 17
Q-Learning Channel Q-Value 1 0.54 2 0.6761...... N-1 0.32 N 0.14 Selects channel 2 Update Q-Value For channel 2 1 2... N-1 N Epoch Reward Q t+1 a t = 1 α Q t + r t α New Q-Value Q t+1 a t = 1 0, 3 0, 68 + 0, 667. 0, 3 Q t+1 a t = 0, 6761 r t = N D t D = 0, 667 18
Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+ Accurate historic behavior of the channel Lookback and Historic Weight Q-Noise Considers the channel s quality for transmission Noise weight Q-Noise+ Accurate historic behavior of the channel while considering the channel s quality for transmission Lookback, Historic Weight and Noise Weight 19
Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+ Accurate historic behavior of the channel Lookback and Historic Weight Q-Noise Considers the channel s quality for transmission Noise weight Lookback Q t+1 a t = (1 α) w t i r t i a t + αr t (a t ) i=1 Q-Noise+ Accurate historic behavior of the channel while considering the channel s quality for transmission Lookback, Historic Weight and Noise Weight 20
Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+ Accurate historic behavior of the channel Lookback and Historic Weight Q-Noise Considers the channel s quality for transmission Noise weight Q t+1 a t = 1 α Q t + αr t a t + (S w η) Q-Noise+ Accurate historic behavior of the channel while considering the channel s quality for transmission Lookback, Historic Weight and Noise Weight 21
Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+ SINR Value η Accurate historic behavior of the channel Lookback and Historic Weight Q-Noise Considers the channel s quality for transmission SINR < 15dB 0.00 15dB SINR < 17dB 0.25 17dB SINR< 20dB 0.50 20dB SINR < 25dB 0.75 SINR 25dB 1.00 Noise weight Q t+1 a t = 1 α Q t + αr t a t + (S w η) Q-Noise+ Accurate historic behavior of the channel while considering the channel s quality for transmission Lookback, Historic Weight and Noise Weight 22
Improving Reinforcement Learning Algorithms in DSA Context Q-Learning+ Accurate historic behavior of the channel Lookback and Historic Weight Q-Noise Considers the channel s quality for transmission Noise weight Q-Noise+ Lookback Q t+1 a t = 1 α w t i r t i a t + αr t (a t ) + (S w η) i=1 Accurate historic behavior of the channel while considering the channel s quality for transmission Lookback, Historic Weight and Noise Weight 23
Performance Evaluation Evaluation Methodology Parameter Default value Learning rate (α) 0.6 Number of channels 5 Transmission attempts 100 Exploration coefficient (ε) 0.25 Epoch duration 5 Threshold between Q-Values for spectrum hand-off (β) 0.1 Lookback 3 Historic weight [0.7 0.2 0.1] SINR weight 0.7 Confidence Interval 95% 24
Results 25
Results 26
Results 27
Results 28
Results 29
Results 30
Conclusions The Q-Learning+ and Q-Noise+ are able to improve the results obtained by reinforcement learning algorithm Better performance for a few number of transmissions For a great number of transmissions, the difference decreases, but always remains better The Q-Noise and Q-Noise+ are capable to transmit with more quality (better SINR) 31
Thank you! Questions? Leonardo Roveda Faganello www.inf.ufrgs.br/~lrfaganello lrfaganello@inf.ufrgs.br Cristiano Both www.inf.ufrgs.br/~cbboth cbboth@inf.ufrgs.br IEEE WCNC - Wireless Communications and Networking Conference Shangai, China, 7-10 April 2013 `