Load Balancing Models based on Reinforcement Learning for Self-Optimized Macro-Femto LTE- Advanced Heterogeneous Network

Similar documents
ECE-517 Reinforcement Learning in Artificial Intelligence

P. Bruschi: Project guidelines PSM Project guidelines.

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

Development of Temporary Ground Wire Detection Device

Performance Evaluation of a MAC Protocol for Radio over Fiber Wireless LAN operating in the 60-GHz band

Network Design and Optimization for Quality of Services in Wireless Local Area Networks using Multi-Objective Approach

Social-aware Dynamic Router Node Placement in Wireless Mesh Networks

B-MAC Tunable MAC protocol for wireless networks

Multiple Load-Source Integration in a Multilevel Modular Capacitor Clamped DC-DC Converter Featuring Fault Tolerant Capability

Signaling Cost Analysis for Handoff Decision Algorithms in Femtocell Networks

Examination Mobile & Wireless Networking ( ) April 12,

A Perspective on Radio Resource Management in B3G

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Lecture September 6, 2011

DS CDMA Scheme for WATM with Errors and Erasures Decoding

An off-line multiprocessor real-time scheduling algorithm to reduce static energy consumption

Adaptive CQI adjustment with LTE higher-order sectorization

Automatic Power Factor Control Using Pic Microcontroller

THE economic forces that are driving the cellular industry

Lecture 4. EITN Chapter 12, 13 Modulation and diversity. Antenna noise is usually given as a noise temperature!

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

Performance Analysis of High-Rate Full-Diversity Space Time Frequency/Space Frequency Codes for Multiuser MIMO-OFDM

Efficient burst assembly algorithm with traffic prediction

Fuzzy Inference Model for Learning from Experiences and Its Application to Robot Navigation

Transmit Beamforming with Reduced Feedback Information in OFDM Based Wireless Systems

Memorandum on Impulse Winding Tester

TELE4652 Mobile and Satellite Communications

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

To Relay or Not to Relay: Learning Device-to-Device Relaying Strategies in Cellular Networks

A Segmentation Method for Uneven Illumination Particle Images

IEEE COMSOC MMTC Communications - Frontiers

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

Electrical connection

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

Radio Resource Management in Beyond 3G Systems

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

Q-learning Based Adaptive Zone Partition for Load Balancing in Multi-Sink Wireless Sensor Networks

Surveillance System with Object-Aware Video Transcoder

Context-Aware Self-Organized Resource Allocation In Intelligent Water Informatics

Transmit Power Minimization and Base Station Planning for High-Speed Trains with Multiple Moving Relays in OFDMA Systems

Lecture 11. Digital Transmission Fundamentals

weight: amplitude of sine curve

2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

Heterogeneous Cluster-Based Topology Control Algorithms for Wireless Sensor Networks

Pointwise Image Operations

EE 40 Final Project Basic Circuit

Mobile Communications Chapter 3 : Media Access

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

A Flexible Contention Resolution Scheme for QoS Provisioning in Optical Burst Switching Networks

Proceedings of International Conference on Mechanical, Electrical and Medical Intelligent System 2017

Receiver-Initiated vs. Short-Preamble Burst MAC Approaches for Multi-channel Wireless Sensor Networks

Interference Coordination Strategies for Content Update Dissemination in LTE-A

Comparison of ATP Simulation and Microprocessor

Dimensions. Transmitter Receiver ø2.6. Electrical connection. Transmitter +UB 0 V. Emitter selection. = Light on = Dark on

Chapter 2. The Physical Layer

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

Dimensions. Model Number. Electrical connection emitter. Features. Electrical connection receiver. Product information. Indicators/operating means

Dimensions. Transmitter Receiver ø2.6. Electrical connection. Transmitter +UB 0 V. Emitter selection. = Light on = Dark on

4.5 Biasing in BJT Amplifier Circuits

Base Station Sleeping and Resource. Allocation in Renewable Energy Powered. Cellular Networks

International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:15 No:03 7

The design of an improved matched filter in DSSS-GMSK system

OPERATION MANUAL. Indoor unit for air to water heat pump system and options EKHBRD011ADV1 EKHBRD014ADV1 EKHBRD016ADV1

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

Will my next WLAN work at 1 Gbps?

Double Tangent Sampling Method for Sinusoidal Pulse Width Modulation

Collaborative communication protocols for wireless sensor networks

Channel Estimation for Wired MIMO Communication Systems

Negative frequency communication

Open Access Analysis of Monitoring System Reliability for Wind Turbine Based on Wireless Sensor Network

A Novel D2D Data Offloading Scheme for LTE Networks

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

An Emergence of Game Strategy in Multiagent Systems

HIGH THROUGHPUT EVALUATION OF SHA-1 IMPLEMENTATION USING UNFOLDING TRANSFORMATION

Knowledge Transfer in Semi-automatic Image Interpretation

OPERATION MANUAL. Indoor unit for air to water heat pump system and options EKHBRD011AAV1 EKHBRD014AAV1 EKHBRD016AAV1

A NEW DUAL-POLARIZED HORN ANTENNA EXCITED BY A GAP-FED SQUARE PATCH

Comparing image compression predictors using fractal dimension

Installation and Operating Instructions for ROBA -brake-checker Typ

Demand-based Network Planning for Large Scale Wireless Local Area Networks

Increasing Measurement Accuracy via Corrective Filtering in Digital Signal Processing

Optical fibres. Optical fibres made from high-density glass can carry light signals long distances without losing any light through their sides.

Signal Characteristics

Installing remote sites using TCP/IP

Shortest and Efficient Multipath Routing in Mobile ad hoc Network (MANET)

QoE Driven Video Streaming in Cognitive Radio Networks: The Case of Single Channel Access

Interference Avoidance with Dynamic Inter-Cell Coordination for Downlink LTE System

Analysis ofthe Effects ofduty Cycle Constraints in Multiple-Input Converters for Photovoltaic Applications

Control and Protection Strategies for Matrix Converters. Control and Protection Strategies for Matrix Converters

A1 K. 12V rms. 230V rms. 2 Full Wave Rectifier. Fig. 2.1: FWR with Transformer. Fig. 2.2: Transformer. Aim: To Design and setup a full wave rectifier.

Performance Study of Positioning Structures for Underwater Sensor Networks

Joint Optimization of Uplink Power Control Parameters in LTE-Advanced Relay Networks

A-LEVEL Electronics. ELEC4 Programmable Control Systems Mark scheme June Version: 1.0 Final

A Harmonic Circulation Current Reduction Method for Parallel Operation of UPS with a Three-Phase PWM Inverter

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

4 20mA Interface-IC AM462 for industrial µ-processor applications

Transcription:

Load Balancing Models based on Reinforcemen Learning for Self-Opimized Macro-Femo LTE- Advanced Heerogeneous Nework Sameh Musleh, Mahamod Ismail and Rosdiadee Nordin Deparmen of Elecrical, Elecronics and Sysems Engineering, Faculy of Engineering and Buil Environmen, Universii Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia. sameh.musleh@gmail.com Absrac Heerogeneous Long Term Evoluion-Advanced (LTE-A) nework (HeNe) uilizes small cells o enhance is capaciy and coverage. The inensive deploymen of small cells such as pico- and femo-cells o complemen macro-cells resuled in unbalanced disribuion of raffic-load among cells. Machine learning echniques are employed in cooperaion wih Self-Organizing Nework (SON) feaures o achieve load balancing beween highly loaded Macro cells and underlay small cells such as Femo cells. In his paper, wo algorihms have been proposed o balance he raffic load beween Macro and Femo cells. The wo proposed algorihms are named as Load Balancing based on Reinforcemen Learning of end-user SINR (LBRL-SINR) and Load Balancing based on Reinforcemen Learning of Macro cell-hroughpu (LBRL-T). Boh of he proposed algorihms uilize Reinforcemen Learning (RL) echnique o conrol he reference signal power of each Femo cell ha underlays a highly loaded Macro cell. A he same ime, he algorihm moniors any degradaion in he performance merics of boh Macro and is neighbor Femo cells and reacs o roubleshoo he degradaion in real ime. The simulaion resuls showed ha boh of he proposed algorihms are able o off-load end-users from highly loaded Macro cell and redisribue he raffic load fairly wih is neighbor Femo cells. As a resul, boh of call drop rae and call block rae of a highly loaded Macro cell are decreased. Index Terms Load Balancing; LTE-A HeNe; Small Cells; Reinforcemen Learning. I. INTRODUCTION One of he 3GPP echnologies ha mees he high demand for new services is LTE-A HeNe. I inegraes various nework srucures and various cell ypes. This is for he purpose of offering new daa and voice services, improved laencies and higher hroughpu for end-users. The main nodes of HeNes include High Power Nodes (HPNs) such as Macro enodebs, and Low Power Nodes (LPNs) such as Pico and Femo cells. LPNs are defined in 3GPP as small cells. They become imporan elemens of LTE-A HeNe, and hey conribue o improve he performance of he whole nework in erms of increasing boh of he link and sysem capaciy, as well exending he nework coverage in boh oudoor and indoor neworks [1]. The deploymen of openaccess Femo cells enables Macro cells o reduce he opporuniy of being overloaded or congesed wih a high number of end-users. Moreover, he cos of deploying Macro sies o solve he problems of nework capaciy and coverage is reduced. A Femo cell is a low power node. I becomes compulsory ha many processes including he insallaion and roubleshooing of Femo cells need o be auomaed. This is for he reason ha he end-user is no expeced o have he enough echnical knowledge o be able o insall Femo cells or o roubleshoo hem. As a resul, he Self-Organizing Nework (SON) for LTE-A is a new echnology ha consiss of new conceps and funcionaliies o auomae he operaion of LTE-A HeNes owards beer performance and higher qualiy of service [1]. Specifically, he operaions of self-uning and self-opimizaion are defined in SONenabled LTE-A neworks [2]. SON is a recen developmen, and i is par of 3GPP sandard for LTE-A [3]. Recenly, diverse challenges relaed o SON-enabled HeNes have been widely researched in various inernaional research projecs including 3GPP projecs [4],[5]. Various effors ha have been aken o develop advanced Radio Resources Managemen (RRM) algorihms o decrease he effec of inerference in a dense LTE-A HeNes [6]. The raffic load balancing is one of he mos demanding opics for boh he auomaion and self-opimizaion processes in he conex of LTE-A neworks [7]. The high raffic volumes, as well he unbalanced raffic volumes which are generaed from end-users are he moivaion for load balancing echniques o be researched. The raffic load balancing is argeing o achieve he balance beween LTE- A radio resources and end-users raffic. The process of load balancing affecs he Grade of Service (GoS), which is specifically relaed o call mainainabiliy. Parameers such as radiaion paern power [8], Handover power-margins [9] and reference signal power are opimized o cope wih endusers raffic. There have been a few sudies researched in he field of load balancing for Macro and small cells in HeNes [10, 11]. Unbalanced raffic is a prominen issue ha should be invesigaed in-deph for indoor and oudoor HeNe deploymen scenarios. Reinforcemen Learning (RL) is a echnique ha is specifically used for ineracive learning [12]. I is based on Q-Learning (QL) echnique which does no need a sysem defined by a formula or ransfer funcion. As a resul, i becomes an aracive echnique o be used o opimize he operaions of LTE-A radio access nework in real ime [13-16]. In his paper, wo emerging load balancing echniques have been proposed o overcome he high raffic-load problem of Macro cells in LTE-A HeNe. Boh of he ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1 47

Journal of Telecommunicaion, Elecronic and Compuer Engineering proposed echniques, named as LBRL-SINR and LBRL-T, are mainly employing Q-Learning mehod o process he degraded performance merics of Macro cells and o deliver higher link qualiy for end-users. II. RELATED WORK Mos researches, which are relaed o raffic load balancing in LTE and LTE-A are based on making adjusmens o he handover or cell selecion process in order o manage he raffic disribuion beween he neighbor cells [17]. The approaches in his field can be classified ino Handover-based conrol and coverage conrol of a given cell. In he case of Handover-based conrol, he UEs are seered ino specific cells by adjusing he handover offses of each cell. In coverage conrol approach, enodeb will eiher exend is coverage o reach more UEs or reduce is coverage in case of overloading so ha more UEs will handover o is neighbor enodebs. The auhor in [18] explained a mehod for monioring he usage of Resource Blocks (RBs) in enodeb. Whenever he RBs uilizaion raio crosses specific limi, i riggers high load saus which will iniiae opimizing enodeb s Reference-signal power. This will reduce he high load a he enodeb and enable neighbor cells o collaborae in he offloading process. The auhor in [19] presened a echnique o opimize Jain s Fairness Index. The proposed echnique reallocaes UEs owards underlay small cells, which are he Pico, Relay and Femo cells. Boh of he Pico and Femo cells use wirebased backhaul o connec o he closes enodeb. On he oher hand, Relay nodes use compleely wireless connecion o connec o is neighbor enodebs. In [20], he auhor proposed an algorihm ha moniors enodeb load based on he Handover process and he capaciy of neighbor enodebs. The algorihm riggers an offloading process whenever neighbor enodebs are found o have an adequae capaciy. The echnique could achieve noiceable performance improvemens, especially on UE hroughpu and BLER. In [21], he auhor proposed an algorihm o fairly disribue he enodebs load by making reducions in he Handover-overhead, which is necessary for iniiaing any Handover process. The algorihm is designed based on solving Muli-objecive Opimizaion Problem. There are wo conflicing arges o be conrolled by he opimizer, signaling overhead and raffic load. A Higher weigh is given by he opimizer o he desired arge. III. FORMULATION OF REINFORCEMENT LEARNING TECHNIQUE An LTE-A HeNe is designed as a Muli-Agen Reinforcemen Learning sysem, in which each Femo cell is defined as an agen [12]. Reinforcemen learning deals wih he issue of finding sraegy for an auonomous agen o perceive and reac in is environmen o selec opimal acions o reach is objecive. For every acion ha he agen akes in is environmen, a rainer ses a reward or penaly o rigger he agen o decide abou a new sae. The saes are defined in his paper as a range of possible reference signal power values. An acion is defined as he opimal reference signal power value. The agen is learning from he delayed reward in order o selec acions ha resul in he highes possible value of cumulaive reward. A Q-learning algorihm is able o achieve he mos effecive Q-value, based on delayed rewards. This is rue regardless of he awareness of he agen abou he impac of is acions on he sysem where acions are applied. Reinforcemen learning echniques are associaed wih dynamic programming echniques, which are used o solve problems relaed o opimizaion. The agens collaborae ogeher during he learning process o converge o an opimal policy faser. Meanwhile, each agen during his sage pus he learned policy ino acion separaely, increasing he capabiliy of he designed self-opimizaion algorihm o run in disribued manner. The naure of LTE-A HeNe is rapidly changing due o he dynamic change in parameers and values relaed o he mobiliy of User Equipmen (UEs), mulipah fading, changing raffic disribuions, ec. Each agen learns hrough he well-known Markov Decision Process (MDP), in which he agen is aware abou a se S of discree saes. Addiionally, here is a se A of acions for he agen o implemen. A every ime inerval of he opimizaion epoch, he agen acquires he curren sae s before i selecs a curren acion a and execues i. The agen receives a reward r(s, a ) and he environmen urns o he nex sae s +1 = δ(s, a ). Boh of he δ and r are he main funcions in he environmen, and he agen migh be unaware of hem. In MDP, boh of he funcions δ(s, a ) and r(s, a ) have a direc correlaion wih he curren sae and acion, raher han on previous saes or acions. The agen learns a policy π o decide abou he nex acion a +1, depending on he curren acquired sae s which is, π(s ) = a. A precise way o specify which policy π ha he agen will learn is he policy ha resuls in he greaes cumulaive reward for he agen. In order o make his requiremen specific and more accurae, we se he cumulaive value V π (s ) which is resuled from a random policy π from random firs sae s as follows: V π (s ) = r + γr f + γ 2 r +1 f + γ 3 r +2 f + = γ k +k r f k=0 (1) where he order of reward values r +k is produced by saring from sae s, and ieraively uilizing he policy π o choose acions as menioned above (i.e., a = π(s ), a +1 = π(s +1 ) ec,.). Each Femo cell is defined as an agen, whereby i ineracs in real ime wih he environmen and selecs an acion in response o he changing sysem saes. The agen depends on he curren Q-values o have he highes possible reward. Meanwhile, i has o idenify he acions ha produce he highes reward in he long erm. Here 0 γ < 1 is a consan value ha shows he relaive value of fuure reward compared o curren reward. Specifically, he fuure reward which is ye o be received are discouned by γ k. If γ k has he value of 0, hen only he insan reward is considered. When γ value closes o 1, he prioriy is given o he fuure rewards han he insan reward. The discouned cumulaive reward is defined as V π (s ), i acquires he policy π from he firs sae s. Logically, furher rewards should be discouned relaive o immediae rewards because, generally, he agen would prefer o acquire he reward in he shores possible ime seps. We require ha each Femo cell learns a policy π ha produces he 48 ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1

Load Balancing Models based on Reinforcemen Learning for Self-Opimized Macro-Femo LTE-Advanced Heerogeneous Nework maximum value of V π (s) for he oal number of saes s, which will be referred o as an opimal policy, denoed π. π = argmax V π (s) (2) π V π (s) is defined as he highes discouned cumulaive reward ha he agen can gain saring from he iniial sae s. In oher words, i is he discouned cumulaive reward achieved hrough execuing he opimal policy ha is sared from sae s. I is a challenge for he agen o achieve he opimal policy π because of he lack of raining daa which does no offer raining examples in he form of (s, a). However, he learner is informed abou one hing, which is he sequence of he insan reward r(s k, a k ) for k = 0, 1, 2, This daa faciliaes he process o learn a numerical evaluaion funcion which can be represened by saes and acions, hen ge he opimal policy in erms of his evaluaion funcion. One selecion for evaluaion funcion is V π (s). The proposed LBRL algorihms in his paper should give preference o sae s 1 over sae s 2 each ime when V π (s 1 ) is higher han V π (s 2 ), as he cumulaive fuure reward is higher han s 1. The algorihm policy makes a selecion from he saes space, and no from he acions space. However, in some cases V π (s) can be used o selec from he acions space as well. The opimal acion o be seleced in sae s is he acion a ha produces he highes insan reward r(s, a) added o he amoun V π (s) of he nex sae afer i is discouned by γ as shown in Equaion 3. π (s) = argmax a [r(s, a) + γv π (δ(s, a))] (3) Recall ha he variable δ(s, a) idenifies he achieved sae from applying acion a o sae s. Furher, an agen is defined in his paper as a Femo cell ha underlays a Macro cell. The agen ha runs LBRL algorihms adops an opimal policy by learning V π (s), hen he agen will be equipped wih complee knowledge of he insan reward funcion r and he sae ransiion funcion δ. As he agen has gained knowledge abou he variables r and δ which are employed by he environmen o reac o is acions, hen he opimal acion, a, for any sae s can be deermined. Even hough learning V π (s) is an efficien way o ge he opimal policy, i can be used only when he agen has a complee knowledge of δ and r. This needs he capabiliy o expec he insan resul of boh of he insan reward and fuure reward for each sae-acion pair. Pracically, he agen will no be able o expec an accurae resul of applying random acion o a random sae. Whenever he value of δ or r is undefined, hen he process of learning V π (s) is useless for choosing he opimal policy. As well, he agen will no be able o esimae Equaion 2 in his case. So anoher evaluaion funcion should be used by he agen for his framework. The evaluaion funcion Q(s, a) can be deermined as shown in Equaion 4, so ha is value is he highes discouned cumulaive reward o be gained by saring from sae, s, iniially and execuing acion a. Q(s, a) = r(s, a) + γv π (δ(s, a)) (4) Noe ha Q(s, a) is exacly he quaniy ha is maximized in Equaion 2 o choose he opimal acion a in sae s. Therefore, we can rewrie Equaion 2 in erms of Q(s, a) as π (s) = argmax Q(s, a) (5) which indicaes ha learning Q-funcion insead of learning V π (s) will make he agen able o choose an opimal acion even hough he variables r and δ are unknown for he agen. Learning he Q-funcion is similar as learning he opimal policy. The main issue is abou figuring ou a rusworhy mehod o esimae Q values from he insan values of reward, r. Such a mehod is possible o be achieved by ieraive approximaion. This conclusion is coming afer noicing he very close relaionship beween V π and Q in Equaions 6 and 7 as follows: Tha allows rewriing as: a V π (s) = max Q(s, a ) (6) a Q(s, a) = r(s, a) + γ max Q(δ(s, a), a ) (7) a which is an ieraive equaion ha provides us he foundaion for an algorihm ha ieraively approximae Q. A Q-learning algorihm learns by repeaedly decreasing he differences beween he Q values of he succeeding saes. I is able o solve opimizaion problems ha deal wih sysems which are undefined in closed form expression, and i depends on he Temporal Difference (TD) mehod during he learning process. To esimae he Q-value in Equaion 7, an agen has he arge o choose he acion ha produces he highes value of long erm reward, r. In Secion III of his paper, here are wo formulas ha have been proposed o calculae he reward, r, for each of he proposed algorihms. The proposed LBRL algorihms are specified by firsly, conrolling he ransmied power of he Reference Signal (RS) a each Femo cell. Secondly, he Reinforcemen Learning (RL) as one of he machine learning echniques, which will conver each Femo cell o a smar node ha is able o ake a decision and auo-une iself for an opimal sae. IV. MACRO-FEMTO SELF ORGANIZING NETWORK MODEL The Self Organizing Nework (SON) feaures are considered powerful developmen in he 4 h generaion (4G) of mobile neworks ha are peraining o he nex sage of developmen which includes 4G and beyond 4G neworks [3]. SON feaures are used when here is rapidly changing raffic, highly flucuaing RF channel or o auomae he operaor policies which are specifically relaed o he mobile radio access nework. Is main feaures are caegorized ino four caegories, which are self-opimizaion, selfconfiguraion, self-diagnosis and self-healing [18]. SON funcions have been idenified and used by muliple mobile service operaors, as i leads o simplified operaions and increasing profiabiliy Our proposed algorihms uilize SON funcions, which include self-diagnosis, self-healing, and self-opimizaion of Macro and Femo cells in LTE-A HeNe. In order o achieve fair disribuion of end-users beween highly loaded ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1 49

Journal of Telecommunicaion, Elecronic and Compuer Engineering Macro cell and is neighbor Femo cells, boh of he proposed algorihms are mainly based on he selfopimizaion concep for SON-enabled LTE-A HeNe, which is mainly employing Reinforcemen Learning (RL) and Q-learning echniques o offload end-users from he Macro cell ino is neighbor Femo cells. A se of hree performance merics for highly loaded Macro cell are he main inpus for each of he proposed algorihms, LBRL-SINR and LBRL-T. The hree performance merics are call block rae (B), call drop rae (D), and average SINR, which are specific inpus of LBRL- SINR algorihm. However, B, D, and cell hroughpu (T) are he specific inpus of LBRL-T algorihm. The SON module a each Femo cell is riggered only when a Macro enodeb declares a high load sae or an overload indicaor (OI) is acivaed, hen a Macro cell will rigger he LBRL algorihm o be execued a is neighbor Femo cells, as shown in Figure 1. The signaling beween each Femo and Macro cell is carried over X2 or S1 inerface. Each Femo cell will independenly increase he reference signal (RS) power o increase is coverage region. As a resul, he raffic in ho areas is redireced o lighly loaded areas under Femo cells, and hus load balancing is achieved. sae (s), acion (a) and reward (r) are he inegral pars ha need o be defined a each Femo cell, i.e. Femo cell-i, as shown in Figure 2. The sae is defined as he Reference Signal (RS) power of Femo cell-i a. The acion of Femo cell-i is he opimal reference signal power level ha will be seleced from a range of pre-defined power levels for Femo cell-i a ime. LBRL-SINR Algorihm is riggered a Femo cell-i Three performance merics are acquired from an overloaded Macro cell and exchanged wih he neighbor Femo cell-i : average SINR, Call Drop Rae (D), Call Block Rae (B) Reward, f(i) r +1, is calculaed a Femo cell-i Q-Table is updaed afer esimaing Q(s,a) a Femo cell-i Sar Macro cell OI Saus 0 (Normal Load) Normal operaion for boh of Macro and Femo cells Figure 1: Macro-Femo SON model 1 (High Load) Macro cell riggers is neighbor underlay Femo cells over X2 or S1 inerface o run an Offloading algorihm (LBRL-SINR or LBRL-T) The proposed SON archiecure is disribued archiecure and no cenralized. In oher words, boh of LBRL algorihms do no need o connec o a daabase o exchange he performance merics daa, while he algorihm is running on live nework. The normal signaling over X2 or S1 inerface will be enough for each Femo cell o acquire he required performance merics from is neighbor Macro cell. V. LOAD BALANCING BASED ON REINFORCEMENT LEARNING OF END-USER SINR (LBRL-SINR) I is normal for he CQI of each User Equipmen (UE) o decrease on he Macro cell side, and i implies ha he Signal-o-Inerference-plus-Noise Raio (SINR) of he PDSCH channel is no sufficien. As a resul, he cell hroughpu of he Macro cell will decrease. By riggering he LBRL-SINR algorihm a each underlay Femo cell, he algorihm will reac by adjusing he reference signal power eiher hrough adding more power or decrease he power o adjus he coverage region size of each Femo cell. The algorihm decides abou suiable power level a each Femo cell, which in urn, i balances he raffic load among Macro and is surrounding Femo cells. The LBRL-SINR algorihm uilizes Q-learning echnique o learn he opimal policy (Q-Value) ha will deermine he bes power level for Femo cell, mainly based on he degraded performance merics of an overlay Macro cell. The Acion is applied a Femo cell-i o selec he bes RS Power sae, s, ha maximizes f(i) he received Reward,. r +1 Figure 2: The main modules and execuion sequence of LBRL-SINR algorihm As soon as he seleced acion, a, is applied, he reward (r f ) a Femo cell-i is esimaed as proposed in Equaion 8. The value of r f is an indicaor of he curren performance of boh Macro and is neighbor Femo cell-i. An overlay Macro cell and Femo cell-i collaborae in each opimizaion cycle and exchange he load informaion and performance merics hrough X2 inerface or S1 inerface as an alernaive. The hree performance merics which will be used o calculae he reward a Femo cell-i are: he average SINR of all end-users a boh Macro cell and Femo cell-i a ime (SINR m and SINR f ), Call Drop Rae a Macro cell and Femo cell-i a ime (D m +D f ), Call Block Rae a Macro cell and Femo Cell-i a ime (B m + B f ). The proposed reward funcion is defined as follows: r f = (w 1 (SINR m + SINR f ) + w 2 (D m +D f ) +w 3 (B m + B f ))* 1/c (8) where w 1, w 2 and w 3 are he weighs. SINR m is he average of SINR m,k for all end-users a ime. SINR m,k is defined as he SINR of UE (k) a Macro cell (m) as defined in Equaion 9. The consan c is o keep he reward (r f ) value beween 0 and 1. SINR m,k (db) = P m + G m PL m,k (I m,k + n 2 ) (9) where: P m = downlink ransmied power from Macro cell (m) o end-user (k) G m = downlink anenna gain of Macro cell (m) PL m,k = Pah loss beween Macro cell (m) and end-user (k) 50 ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1

Load Balancing Models based on Reinforcemen Learning for Self-Opimized Macro-Femo LTE-Advanced Heerogeneous Nework I m,k n = The received downlink inerference a end-user (k) who connecs o Macro cell (m) = Thermal noise The downlink iner-cell inerference model is simulaed for LTE-A downlink. LTE-A employs Orhogonal Frequency Division Muliple Access (OFDMA) echnique for is physical layer, which conribues in achieving higher specral efficiency for LTE-A in comparison wih he previous versions of mobile echnologies. The smalles uni of bandwidh o be assigned for each end-user is he Physical Resource Block (PRBs). Each PRB serves a single end-user a a ime. Hence, he risk of having inracellinerference is miigaed by he menioned assignmen scheme of PRBs. As much as he value of he reward, r f, is high, as much as he Femo cell-i coverage becomes wider. As a resul, he opimized reference signal power level will force more endusers o camp on he Femo cell insead of camping on he overlay Macro cell. VI. LOAD BALANCING BASED ON REINFORCEMENT LEARNING OF MACRO CELL THROUGHPUT (LBRL-T) This algorihm considers mainly he cell-hroughpu (T) for all UEs insead of he average SINR in he case of LBRL-SINR, o dynamically conrol he RS power a each Femo cell. I is assumed ha he reference signal power of he Macro cell remains he same and is no subjec o be changed by he algorihm. This is o ensure full nework coverage and o minimize he chance of creaing coverage holes. As a some insan, Macro cell and is neighbor Femo cell may reduce heir coverage ogeher a he same ime, which will creae coverage hole. In his algorihm, he reward is esimaed based on he cell hroughpu (T) of Macro cell. The T value is one of he main componens ha consrucs he reward funcion (r f ) as shown in Equaion 10. The sae and acion of Femo cell-i are modeled in he same way as LBRL-SINR in Secion IV, while he process of esimaing he reward is differen from LBRL-SINR algorihm. There are hree performance merics, which are required in order o esimae r f in LBRL-T, hree of he merics are acquired from he Macro cell and is neighbor Femo cell-i simulaneously. The firs meric is he average cell hroughpu a ime (T m + T f ), he second meric is he Call Drop Rae a ime (D m +D f ) and he hird meric is he Call Block Rae a ime (B m + B f ). The menioned merics consruc he reward funcion which is defined as follows: will be achieved by decreasing he chance for a Macro cell wih high number of end-users o have high raes of dropped or blocked calls (D or B). However, if he incremen in he refernce signal power a Femo cell-i was unnecessary or led o unsable performance in erms of causing higher Drop Calls Rae (D) or higher Block Calls Rae (B) a Macro cell side, he algorihm will deec he degraded B or D, and esimaes new reward value, r f +1, in he nex opimizaion epoch which should be lower han he previous reward, r f. As a resul, an opimized acion, a, will be applied o reduce he RS power o lower level. VII. SIMULATION ENVIRONMENT An LTE-A Heerogeneous Nework (HeNe) consiss of wo ypes of cells, Macro cells and underlying Femo cells. In 3GPP [22], dense LTE-A HeNe is defined as a heerogeneous nework ha consiss of underlay small cells varies from 4 o 10 cells which are defined as neighbors o heir overlay Macro cell. Our simulaion scenarios are conduced on sysem-level simulaion which is comprising 7 Macro cells and 42 underlay Femo cells as shown in Figure 3. A number of 6 Femo cells is disribued randomly wihin he coverage area of heir neighbor Macro cell. As well, each Femo cell is defined as neighbor o is neares overlay Macro cell. The underlay Femo cells are able o communicae wih he Macro cell hrough X2 or S1 inerface o exchange performance merics and load informaion. The sysem opology as shown in Figure 3 consiss of 7 Macro cells. The cener Macro cell is simulaed wih high raffic load ha is originaed from a maximum of 100 endusers. The res of 6 Macro cells is simulaed wih normal raffic load ha is originaed from a maximum of 20 endusers. The sysem bandwidh varies according o he cell ype. Each Macro cell has oal bandwidh of 100 MHz which is he oal available bandwidh from deploying 5 Componen Carriers (CCs), each CC provides a channel bandwidh of 20 MHz. Each Femo cell provides a channel bandwidh of 10 MHz. The raffic load of he cener Macro cell in he 3 simulaion scenarios is simulaed o uilize 70% o 99% of he Macro cell bandwidh. Meanwhile, normal raffic load is simulaed o uilize a maximum of 25% of he available bandwidh a each cell of he oal 6 surrounding Macro cells. r f = (w 1 (T m + T f ) + w 2 (D m +D f ) +w 3 (B m + B f ))* 1/c (10) The LBRL-T algorihm keeps monioring he cell hroughpu (T) o no degrade a any ime insance afer he new acion, a, is applied. The immediae response of he algorihm afer an acion, a, is o esimae he new reward value, r f +1. The higher r f +1, he higher RS power value o be assigned o Femo cell-i, which means increasing he chance of Femo cell-i o off-load more end-users from is neighbor Macro Cell. As a resul, an improved performance Figure 3: Sysem opology of dense LTE-A HeNe ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1 51

Journal of Telecommunicaion, Elecronic and Compuer Engineering Three simulaion scenarios have been execued. They are: Fixed reference signal power allocaion, dynamic reference signal power allocaion by LBRL-SINR algorihm, and he hird scenario is a dynamic reference signal power allocaion by LBRL-T algorihm. In each of he hree scenarios, each UE admis o eiher Macro cell or is neighbor Femo cell depending on which cell has higher reference signal power value, as shown in Figure 4. If he cell Overload Indicaor (OI) is no acive, his means ha he cell is sill able o provide RBs o any new end-user ha requess a connecion or call. Oherwise, he call/connecion reques from he enduser will be blocked. A dropped call is recorded if he received signal power of an end-user ha has esablished connecion wih eiher Macro or Femo cell is lower han pre-deermined hreshold value of -110 dbm. UE(k) reques a service Selec Macro or Femo cell wih Maximum RSRP In Figure 6, he improved performance of Macro cell is shown hrough he reduced rae of dropped calls (D). In oher words, he low Call Drop Rae (D) is an indicaor for higher percenage of successful handovers (HO) among cells. When LBRL-SINR algorihm is riggered a an underlay Femo cell, i could show he lowes Call Drop Rae (D), as well i showed he lowes Call Block Rae (B) in comparison wih boh of he reference case and LBRL-T algorihm. This confirms ha acquiring he average SINR of end-users insead of he average Cell-Throughpu (T) conribues in making more accurae decisions by he QL opimizer o selec he bes RS power level a each Femo cell. More accurae reward values (r f ) were fed o he QL opimizer when LBRL-SINR is riggered. As a resul, he LBRL-T algorihm showed sub-opimal performance in comparison wih LBRL-SINR, as shown in he Figures 5 and 6. 0.29 0.28 Macro is seleced as a Serving-Cell Yes Macro cell RSRP > Femo cell RSRP No Femo cell is seleced as a Serving-Cell 0.27 Yes Is Serving-Cell RSRP < Threshold-RSRP No Call Block Rae (B) 0.26 0.25 0.24 0.23 LBRL-SINR LBRL-T Fixed RS-Power Blocked Call: If UE(k) is in IDLE Mode Dropped Call: if UE(k) is in CONNECTED Mode 1 Serving-Cell Overload Indicaor (OI) Saus 0 0.22 0.21 0.7 0.75 0.8 0.85 0.9 0.95 1 Load Percenage Figure 5: The oupu Call Block Rae (B) for highly loaded Macro cell Allocae RBs o UE(k) 0.18 Figure 4: Basic procedures for esimaing Call Block Rae (B) and Call Drop Rae (D) VIII. RESULTS AND DISCUSSION To assess he performance of he proposed algorihms, he same performance merics used in he inpu sage o esimae he reward values were used again in he oupu sage o assess he performance of he algorihms. Boh of Call Drop Rae (D) and Call Block Rae (B) have been esimaed for each simulaion scenario and represened graphically in Figures 5 and 6. In he firs simulaion scenario, fixed RS power level of 19 dbm was se for each Femo cell. This scenario led o degraded performance a Macro cell and generaed considerable percenage of dropped calls, D, and blocked calls, B. The y-axis in boh figures represens he percenage of B and D respecively. In paricular, B is he mos meric ha was affeced by he congesion siuaion. In Figure 5, lower Call Block Rae (B) for boh algorihms is shown in comparison wih he fixed RS power assignmen scheme, which indicaes ha he available bandwidh is managed fairly among Macro and is neighbor Femo cells. As a resul, he chance for Macro cell o recover from congesion becomes higher by uilizing LBRL algorihms, and boh of LBRL-SINR and LBRL-T algorihms showed a reduced rae of blocked calls over he normal scheme of fixed RS power assignmen. Call Drop Rae (D) 0.17 0.16 0.15 0.14 0.13 LBRL-SINR LBRL-T Fixed RS-Power 0.12 0.7 0.75 0.8 0.85 0.9 0.95 1 Load Percenage Figure 6: The oupu Call Drop Rae (D) for highly loaded Macro cell In he second and hird simulaion scenarios, boh of LBRL-SINR and LBRL-T evolved o new values for reference signal power ha flucuaed in he range of 19 ± 3 dbm a each underlay Femo cell. In Figure 7, a comparison is shown for he average reference signal power of he 6 Femo cells ha underlay Macro cell 1 (Cenral Macro), where he LBRL algorihms were riggered and execued during one opimizaion cycle for each simulaion scenario. A each Femo cell, he minimum RS power level was se o 10 dbm, which is he lowes RS power level where neiher LBRL-SINR nor LBRL-T will go lower han his hreshold 52 ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1

Load Balancing Models based on Reinforcemen Learning for Self-Opimized Macro-Femo LTE-Advanced Heerogeneous Nework value. Furher, a maximum value of 22 dbm was se for he RS power a each Femo cell. As shown in Figure 7, in order o achieve he prospecive load balancing among Macro and is neighbor Femo cells, he LBRL-SINR algorihm applied an incremen of 1 o 3 dbm of RS power a Femo cells 1, 2, and 4. In he hird simulaion scenario, LBRL-T applied he same incremen of 1 o 3 dbm for Femo cells 1, 5, and 6. The incremen in reference signal power means ha Femo cells 1, 2, 4, 5, and 6 are exending heir coverage, and more end-users will be able o camp on he hose 5 Femo cells insead of camping on heir overlay Macro cell. However, if a degraded performance is discovered by he algorihm which could be eiher from Macro cell side or from is neighbor Femo cells side, he algorihm will reac and decrease he Femo cell RS power. A decremen of 1 o 3 dbm was applied by he LBRL-SINR for Femo cells 3, 5, 6. As well, he same decremen was applied for Femo cells 2, 3, and 4 by LBRL- T algorihm as shown in Figure 7. As menioned in he previous secions of his paper, here are four ypes of performance merics ha he algorihm could deec for highly loaded Macro cell, hose are high Call Drop Rae (D), high Call Block Rae (B), low cell-hroughpu (T) and low average SINR. The degradaion of any of hose merics will affec he reward values as saed previously in Equaions 8 and 10. As a resul, he algorihm will reduce he RS power level a he Femo cell where he reward is esimaed in order o keep an opimal values of B, D, and SINR if LBRL- SINR algorihm is riggered, or B, D, T, if LBRL-T algorihm is riggered. The LBRL-T algorihm is recommended o be used where he mobile operaer could discover hroughpu-relaed issues, such as low End-user hroughpu or low cell hroughpu. Since LBRL-T makes he decision o offload a cell based on he cell hroughpu as shown previously in Equaion 10. On he oher hand, LBRL-SINR uilizing he End-user SINR as a par of is reward formula (Equaion 8) makes his algorihm more suiable o be used in areas where clear indicaion of high inerference spos is available. RS power level (dbm) 25 20 15 10 5 0 LBRL-SINR LBRL-T Reference power level Femo 1 Femo 2 Femo 3 Femo 4 Femo 5 Femo 6 Femo Cell Index Figure 7: RS Power allocaion for 6 Femo cells ha underlay Macro cell wih high load The complexiy and compuaional cos of LBRL-SINR and LBRL-T are negligible since he proposed algorihms ake a few minues for compuing an oupu wih all he needed calculaions during each opimizaion epoch. In addiion, he memory requiremen is limied. The needed size of he look-up able is considered small, as i conains a se of 4 performance merics (B, D, SINR and T) o be exchanged beween Macro cell and is neighbor Femo cell once an LBRL algorihm is riggered o run. IX. CONCLUSION This paper proposed wo algorihms ha opimize he degraded performance of LTE-A Macro cells due o high raffic load. The proposed algorihms uilize Reinforcemen Learning (RL) echniques o auo-une he reference signal power of Femo cells, his resuls in offloading end-users from a congesed overlay Macro cell. Boh of LBRL-SINR and LBRL-T algorihms opimize he RS power level of Femo cells in real ime during every opimizaion epoch of an On-air Macro cell. As a resul, he disribuion of raffic load among Macro and Femo cells is improved, and lower raes of dropped calls and blocked calls is achieved for highly loaded Macro cell. REFERENCES [1] T. Nakamura, S. Nagaa, A. Benjebbour, Y. Kishiyama, T. Hai, S. Xiaodong, e al., "Trends in small cell enhancemens in LTE advanced," IEEE Communicaions Magazine, vol. 51, pp. 98-105, 2013. [2] M. Peng, D. Liang, Y. Wei, J. Li, and H. H. Chen, "Self-configuraion and self-opimizaion in LTE-advanced heerogeneous neworks," IEEE Communicaions Magazine, vol. 51, pp. 36-45, 2013. [3] L. Jorguseski, A. Pais, F. Gunnarsson, A. Cenonza, and C. Willcock, "Self-organizing neworks in 3GPP: sandardizaion and fuure rends," IEEE Communicaions Magazine, vol. 52, pp. 28-34, 2014. [4] W. Wang, J. Zhang, and Q. Zhang, "Cooperaive cell ouage deecion in Self-Organizing femocell neworks," in INFOCOM, 2013 Proceedings IEEE, 2013, pp. 782-790. [5] A. Aguilar-Garcia, S. Fores, M. Molina-García, J. Calle-Sánchez, J. I. Alonso, A. Garrido, e al., "Locaion-aware self-organizing mehods in femocell neworks," Compuer Neworks, vol. 93, Par 1, pp. 125-140, 12/24/ 2015. [6] M. Behjai and J. Cosmas, "Self-organizing nework inerference coordinaion for fuure LTE-advanced neworks," in 2013 IEEE Inernaional Symposium on Broadband Mulimedia Sysems and Broadcasing (BMSB), 2013, pp. 1-5. [7] S. Jia, W. Li, X. Zhang, Y. Liu, and X. Gu, "Advanced Load Balancing Based on Nework Flow Approach in LTE-A Heerogeneous Nework," Inernaional Journal of Anennas and Propagaion, vol. 2014, p. 10, 2014. [8] Y. Khan, B. Sayrac, and E. Moulines, "Cenralized self-opimizaion in LTE-A using Acive Anenna Sysems," in Wireless Days (WD), 2013 IFIP, 2013, pp. 1-3. [9] Z. Alman, S. Sallem, R. Nasri, B. Sayrac, and M. Clerc, "Paricle swarm opimizaion for Mobiliy Load Balancing SON in LTE neworks," in Wireless Communicaions and Neworking Conference Workshops (WCNCW), 2014 IEEE, 2014, pp. 172-177. [10] A. L. Yusof, M. A. Zainali, M. T. M. Nasir, and N. Ya'acob, "Handover adapaion for load balancing scheme in femocell Long Term Evoluion (LTE) nework," in Conrol and Sysem Graduae Research Colloquium (ICSGRC), 2014 IEEE 5h, 2014, pp. 242-246. [11] K. Lee, S. Kim, S. Lee, and J. Ma, "Load balancing wih ransmission power conrol in femocell neworks," in Advanced Communicaion Technology (ICACT), 2011 13h Inernaional Conference on, 2011, pp. 519-522. [12] L. Bu, oniu, R. B. $$ka, and B. D. Schuer, "A Comprehensive Survey of Muliagen Reinforcemen Learning," IEEE Transacions on Sysems, Man, and Cyberneics, Par C (Applicaions and Reviews), vol. 38, pp. 156-172, 2008. [13] E. Bikov and D. Bovich, "Muli-agen Learning for Resource Allocaionn Dense Heerogeneous 5G Nework," in 2015 Inernaional Conference on Engineering and Telecommunicaion (EnT), 2015, pp. 1-6. [14] I. S. Com, x015f, M. Aydin, S. Zhang, P. Kuonen, and J. F. Wagen, "Reinforcemen learning based radio resource scheduling in LTEadvanced," in Auomaion and Compuing (ICAC), 2011 17h Inernaional Conference on, 2011, pp. 219-224. ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1 53

Journal of Telecommunicaion, Elecronic and Compuer Engineering [15] J. Moysen and L. Giupponi, "A Reinforcemen Learning Based Soluion for Self-Healing in LTE Neworks," in 2014 IEEE 80h Vehicular Technology Conference (VTC2014-Fall), 2014, pp. 1-6. [16] O. Iacoboaiea, B. Sayrac, S. B. Jemaa, and P. Bianchi, "SON Coordinaion for parameer conflic resoluion: A reinforcemen learning framework," in Wireless Communicaions and Neworking Conference Workshops (WCNCW), 2014 IEEE, 2014, pp. 196-201. [17] A. Giovanidis, L. Qi, and S. Sańczaky, "A disribued inerferenceaware load balancing algorihm for LTE muli-cell neworks," in 2012 Inernaional ITG Workshop on Smar Anennas (WSA), 2012, pp. 28-35. [18] H. Zhang, X. s. Qiu, L. m. Meng, and X. d. Zhang, "Achieving disribued load balancing in self-organizing LTE radio access nework wih auonomic nework managemen," in 2010 IEEE Globecom Workshops, 2010, pp. 454-459. [19] K. M. Ronoh, A., "Load Balancing in Heerogeneous LTE-A Neworks," Linköping Universiy, 2012. [20] A. Lobinger, S. Sefanski, T. Jansen, and I. Balan, "Load Balancing in Downlink LTE Self-Opimizing Neworks," in 2010 IEEE 71s Vehicular Technology Conference, 2010, pp. 1-5. [21] Z. Li, H. Wang, Z. Pan, N. Liu, and X. You, "Join opimizaion on load balancing and nework load in 3GPP LTE muli-cell neworks," in 2011 Inernaional Conference on Wireless Communicaions and Signal Processing (WCSP), 2011, pp. 1-5. [22] 3GPP, " Small cell enhancemens for E-UTRA and E-UTRAN Physical layer aspecs (Release 12)," 3GPP, vol. TR 36.872 (v12.1.0), Dec. 2013. 54 ISSN: 2180-1843 e-issn: 2289-8131 Vol. 9 No. 1