Joint Beamforming and Power Optimization with Iterative User Clustering for MISO-NOMA Systems

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access Joint Beamforming and Power Optimization with Iterative User Custering for MISO-NOMA Systems Zhengxuan Liu, Lei Lei, Ningbo Zhang, Guixia Kang, and Symeon Chatzinotas Abstract In this paper, we minimize transmit power for mutipe-input singe-output and non-orthogona mutipe access systems. In our anaysis, a arge number of users are partitioned into mutipe user custers/pairs with sma size and uniform power aocation across the custers, and each custer is associated with a beamforming vector. The considered optimization probem invoves how to optimize beamforming vectors, power aocation and user custering. Considering the high computationa compexity in soving the whoe probem, we decompose the probem into two parts, and design a joint agorithm to iterativey optimize them. Firsty, given a user partition, we formuate the beamforming and power aocation probem under a set of practica constraints. The probem is nonconvex. To tacke it, we reformuate, transform, and approximate the nonconvex probem to a quadraticay constrained optimization probem, and deveop a joint beamforming and power aocation agorithm based on semidefinite reaxation to sove it. Secondy, to address the issue of high compexity in obtaining the optima custers, we propose a ow-compexity agorithm to efficienty identify a set of promising custers, forming as a candidate user partition. Based on these two agorithms, we design an agorithmic framework to iterativey perform them and to improve performance. By the agorithm design, the produced user partition can be further improved in ater iterations, in order to further reduce power consumption. Numerica resuts demonstrate that the performance of the proposed soution with iterative updates for user custering, and joint beamforming and power aocation optimization outperforms that of previous schemes. Index Terms Non-orthogona mutipe access, beamforming, semidefinite positive programming, user custering. I. INTRODUCTION Non-orthogona mutipe access NOMA) is considered as a promising technique for 5G systems due to its enhanced performance compared to orthogona mutipe access OMA) []-[4]. In OMA systems, each user excusivey accesses radio resources time, frequency or spreading code). In NOMA systems, mutipe users can be schedued on the same resource in the power domain. The interference among co-channe aocated users can be partiay canceed by successive interference canceation SIC) at the receivers. It has been shown that NOMA can outperform OMA not ony in terms of the sum rate but aso in terms of each user s individua rate [4]. To further improve system capacity, beamforming is appied to NOMA in mutipe-input mutipe-output MIMO) systems [5]-[5]. Random beamforming was studied in MIMO-NOMA Z. Liu, N. Zhang and G. Kang are with the Key Laboratory of Universa Wireess Communications, Beijing University of Posts and Teecommunications, China Emais: {iuzhengxuan, nbzhang, gxkang}@bupt.edu.cn). L. Lei and S. Chatzinotas are with the Interdiscipinary Centre for Security, Reiabiity and Trust SnT), University of Luxembourg, Luxembourg {ei.ei, symeon.chatzinotas}@uni.u). systems, and a weighted proportiona fair-based power aocation was adopted in [5]. Due to its randomness, interference cannot be effectivey canceed, which may resut in imited performance improvement. In [6], the authors enhanced system capacity for mutipe-input singe-output NOMA MISO- NOMA) systems by performing a seria optimizations with user custering, power aocation and zero-forcing beamforming ZFBF) in a separated manner. The ZFBF vector of each custer is obtained by utiizing the strongest user s channe gain of this custer. The soution seected two users with high correation and arge difference in channe gains. Simiary, the authors of [7] proposed a method that ZFBF vectors can be obtained by using any user s channe vector in each custer, and then a user matching agorithm was proposed to choose another user for custering. Different from [5]-[7], beamforming vector of each custer was cacuated by utiizing optimization beamforming agorithm based on majorization minimization method in [8], and fractiona transmit power contro FTPC) which was a sub-optima power aocation method was utiized to aocate power to a users. The authors of [9] investigated beamforming design and power aocation for mutiuser MIMO-NOMA downink systems, where the number of users is more than the number of transmit antennas. The beamforming vector of each custer was obtained by using a new ZFBF technique, which considered the equivaent channe gain of a users in its custer. The optima power aocation proposed in [3] was utiized for intra-custer power aocation. Moreover, they proposed user custering agorithm based on channe gain correations and differences of among users to maximize network throughput. In [0], the authors investigated the mode of channe uncertainties for MISO- NOMA downink systems. A robust beamforming design and power aocation are attained by decouping the formuated nonconvex optimization probem into four optimization probems, and then the probem is soved by appying aternating optimization agorithm. In [], a genera MIMO framework for NOMA downink and upink transmission based on signa aignment was proposed to enhance the performance gains of NOMA, and the impact of fixed power aocation and cognitive radio inspired power aocation on the performance of MIMO- NOMA was studied as we. Note that in [5]-[], each beam serves one custer and a users in a custer are schedued in a NOMA manner. Differenty, the works []-[5] studied that each beam serves ony one user. In [], an iterative agorithm based on concaveconvex procedure was proposed to obtain beamforming vectors of a users for maximizing sum rate. The broadcast messages are the superposition of a users signas. This 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access scheme may resut in higher computationa compexity and error propagation of SIC if there are too many users in systems. In [3] and [4], the beamforming vector of each user is obtained by combining conventiona ZFBF and their proposed beamforming agorithm for each custer. The authors in [5] first empoyed ZF method to avoid mutua interference and information eakage among custers, and then they proposed an aternating optimization method and a constrained concaveconvex procedure to obtain secure beamforming design and power aocation. The proposed schemes in [3]-[5] require that the number of transmit antennas is no ess than the number of users. However, the radio front end has a compexity, size and price that scaes with the number of antennas [6]. In genera, the number of transmit antennas is imited. There may be many users in systems. In such case, the number of transmit antennas is possiby far ess than that of users. This resuts that the methods proposed in [3]-[5] is not feasibe. In addition, the works [5]-[0] focused on spectra efficiency improvement. The transmit power minimization probem with SINR constraints was considered in [3]-[4]. In [7], a power minimization probem for muti-carrier NOMA subject to individua user s data requirement has been investigated. In muti-antenna NOMA systems, power aocation and beamforming, as we as user custering are key factors for system performance in terms of power consumption. In previous studies, e.g., [5]-[9],[], user custering, power aocation and beamforming were typicay considered separatey. We observe that if joint optimization is considered, system performance can be further improved from two perspectives. One is joint beamforming and power aocation. Observing the expressions of signa-to-interference-pus-noise ratio SINR), power aocation is affected by beamforming design, and vise versa. Hence, the mutua infuence between beamforming and power aocation shoud be considered. On the other aspect, power and beamforming optimization argey depends on the decision of user custering, i.e., which users are grouped into a custer. Improper user custers may resut in either high power consumption or faiures in SIC. If power and beamforming are based on proper user custers and then optimized, the former can efficienty suppress intra-custer interference, whie the ater can suppress inter-custer interference. The powersaving performance can be therefore benefited from the joint optimization. In this paper, based on aforementioned considerations, we take into account a three key factors, i.e., beamforming design, power aocation and user custering, in our optimization procedure. Considering the high computationa compexity for jointy optimizing three factors to goba optimum, the proposed agorithmic soution is simpified to two components, i.e., agorithm for user custering, and agorithm for joint beamforming and power aocation. The two components are jointy and iterativey performed in an agorithmic framework. That is, once a user partition is produced, the joint beamforming and power optimization is then performed. By our design, the user partition can be improved in ater iterations, foowed by joint beamforming and power optimization at each iteration. Compared to previous works, e.g., [5]-[9], we provide more opportunities to search user custers based on different criteria, instead of producing ony one user partition based on a singe criterion, e.g., greedy seection for the users with the argest difference in channe gains. Thus, in this work, the overa power performance can be improved from the diverse seection in user custering, and performing joint beamforming and power optimization. Specificay in our proposed agorithmic framework, for generating user custers, we design a sub-optima agorithm with ow compexity. In order to sove the joint beamforming and power aocation for each given user partition, we first formuate the probem and concude its non-convexity. We then derive an equivaent reformuation and approximatey convert it to a convex probem. An joint power aocation and beamforming design agorithm is proposed based on semidefinite reaxation SDR) technique. Numerica resuts demonstrate the proposed joint optimization and iterative agorithm is abe to reduce power consumption, compared to previous schemes. The rest of this paper is organized as foows. Section II outines system mode. Probem formuation, the formuated probem transformation and proposed joint beamforming and power aocation agorithm are presented in Section III. Section IV gives the proposed iterative user custering and joint beamforming and power aocation agorithm. The numerica resuts are provided in Section V. Finay, Section VI concudes the paper. Notation: Bodface uppercase and bodface owercase etters denote matrices and coumn vectors, respectivey. The symbos C n and R n + are used for n-dimensiona compex and nonnegative rea spaces, respectivey. CN a, b) represents the distribution of circuary-symmetric compex Gaussian random variabe with mean a and covariance b. The superscript ) H denotes the Hermitian transpose operator. tr ), rank ), and denote the trace, the rank, the absoute vaue and the Eucidean norm operators, respectivey. O ) is reserved for compexity estimates. Let X 0 denote that X is a Hermitian positive-semidefinite matrix. A m, n) represents m-th and n-th) eement of matrix A, A m, :) and A :, n) denote the m-th row eements and n-th coumn eements of matrix A, respectivey. Finay, X \x denotes that x component is not incuded in the X set. II. SYSTEM MODEL Consider a MISO-NOMA downink system consisting of a base station BS) with M antennas serves K users equipped with singe antenna. Each beam serves a custer consisting of two or more users. In order to reduce the system compexity, we foow the same settings as previous works [], [], that is, two users for each custer. According to users channe strengths, denoted as the Eucidean norm of channe vectors, the two users are defined as near user and far user corresponding to stronger channe user and weaker channe user, respectivey. Note that this definition is different from [] and [3], in which one user cose to the BS, denoted as near user and the other user is not, denoted as far user. Let L denote the number of custers, and the custers set is defined as L = {,,..., L}. BS sends the foowing L 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 3 information-bearing vector λ x,n + λ x,f x =. λl x L,n + λ L x L,f, ) in which x,n and x,f, L denote the near user and far user s signas with zero mean and unit variance in the -th custer, respectivey, and λ and λ denote power aocation coefficients of the near user and far user of the -th custer. The receive signa for the k-th user of the -th custer can be represented as y,k = h H,k Wx + z,k, L, k {n, f}, ) in which h,k C M denotes user k channe vector in the -th custer, W = [ w,..., w L ], and w C M denotes the beamforming vector of the -th custer. Let w = p w, in which p = w is the transmit power for the -th custer, and w is normaized beamforming vector of that custer, i.e., w =. We assume that z,k CN 0, σ ), L, k {n, f} and channe state information is avaiabe at users and BS. According to the system mode, a users have been partitioned into different custers. At the far user of the -th custer, the receive signa can be represented by y,f = h H,f w λ x,n + h H,f w λ x,f λj x j,n + ) λ j x j,f +z,f.3) +h H,f \ In 3), the first term is the intra-custer interference caused by near user in the -th custer. The third term is inter-custer interference which comes from other beams. Simiary, the receive signa for the near user of the -th custer is represented by y,n = h H,nw λ x,n + h H,nw λ x,f λj x j,n + ) λ j x j,f +z,n.4) +h H,n \ Assume the decoding order is f, n). At far user and near user receiver, the near user signa is considered as noise when decoding far user signa. Therefore, the SINRs of far user at far user and near user receiver in the -th custer are respectivey expressed as h H λ p,f w SINR,f = h λ ) p H h,f w + p H j,f w j + σ \ 5) and λ p h H SINR f,n =,n w h λ ) p H h,n w + p H. j,n w j +σ \ 6) In NOMA systems, the SINR ) of far user shoud be equa to min SINR,f, SINR f,n so that the far user s signa x,f has to be decodabe at far user and near user s receivers of the -th custer, and SIC can be carried out at near user in decoding x,n. As described above, SIC can be impemented to cance far user s interference at near user receiver. Hence, near user can decode x,n without interference from x,f. The SINR of near user in the -th custer can be written as SINR,n = h H λ ) p,n w h p H. 7) j,n w j + σ \ Observing 5), 6) and 7), one can see that the SINRs of users are mainy decided by two factors: power aocation coefficient and beamforming vector. Different from [5]-[9] which optimize one factor in their formuated probems, we wi optimize them together. In this paper, we focus on the tota transmit power minimization subject to a users quaity of service QoS) requirements and w =, L. We consider that the power p of each custer is same as settings in [5]-[8], i.e, p = p, L. III. JOINT BEAMFORMING DESIGN AND POWER ALLOCATION Given a user partition, we formuate the beamforming and power optimization probem in 8). We coect a the beamforming vectors w,..., w L and a power aocation coefficients λ,..., λ L to w and λ, respectivey. The optimizing variabes are w, λ, and p. min {w,λ,p} p 8a) s.t. SINR,n γ,n, L, 8b) ) min SINR,f SINR f,n γ,f, L, 8c) w =, L. 8d) In constraints 8b) and 8c), γ,n and γ,f are the target SINR threshod of the near user and the far user in the - th custer, respectivey. The optimization probem in 8) can be equivaenty formuated as foows: min {w,λ,p} p 9a) s.t. SINR,n γ,n, L, 9b) SINR,f γ,f, L, SINR f,n γ,f, L, w =, L. 9c) 9d) 9e) It can be observed that formuation 9), as we as formuation 8), is non-convex, due to non-inear and non-convex constraints 9b) 9d). This motivates us to pursuit an approximated soution instead of obtaining goba optima soution. In order to address 9), we first reformuate 9) to an equivaent probem, since the reformuated probem can be converted to a convex probem by using Tayor series expansion and SDR method. Then an iterative agorithm is proposed to obtain the power aocation and beamforming vector soutions in this section. The non-inear form of the 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 4 couped optimization variabes is the main difficut in soving probem 9). In order to tacke it, we first define α = λ, β = λ and r = p, α and β denote the -th component of the vector α and β, respectivey. Then, we give the foowing proposition to transform origina probem to an equivaent one. Proposition The optimization probem in 9) can be equivaenty expressed as 0). max r {w,α,β,u,v,t,r} h H,n w γ,n u α, L, s.t. h H,n w j u rσ, L, \ 0a) 0b) h H,f w γ,f +γ,f v β, L, h H,f w j 0c) v rσ, L, h H,n w γ,f +γ,f t β, L, h H,n w j 0d) t rσ, L, ) α β α + β, L, 0e) w =, L. 0f) where w C M, u R L +, v R L +, t R L +, α R L +, β R L + and r R +. Proof : Since r is in inverse proportion to p which is positive, the origina minimum probem of p becomes the maximum probem of r. Let numerator and denominator of SINR,n divide p. Then, 9b) is recasted as h H,n w h H,n w j γ,n α. ) + rσ \ Simiary, 9c) and 9d) are respectivey rewritten as h H,f w γ,f h H,f w j + rσ + γ,f ) β ) and h H,n w h H,n w j + rσ γ,f + γ,f ) β. 3) To arrive at a tractabe soution, we introduce additiona sack variabes u, v and t such that the constraints of ), ) and 3) are transferred into 0b), 0c) and 0d), respectivey. The variabes u, v and t are the -th component of the vector u, v and t, respectivey. Then, we adopt the simiar method as proposed in Theorem of [0] to prove that 0e) can be equivaenty transformed by foowing process. The condition in 0e) impies that α +β α β. Observing the right-hand side of it, we know α + β. Hence, α β 0. Let α + β α β be divided by α β. We obtain α + β. Let w, α, β, L be the optimum soutions of probem 0). If α + β =, then w, λ = β, L are the optimum soutions of probem 9) since the same probem is soved with a change of variabes. Otherwise, α + β <. ) Let α = α α + β and β ) = β α + β such that + β = without vioating SINR constraints 0b) α 0d) since α + β <. Furthermore, the objective vaue keeps invariabe since the objective is ony a function of p. Thus, the obtained optimum soutions w, α, β, L of probem 0) are the optimum ones w, λ = β, L of probem 9). Putting together a transformations above, we obtain the equivaent probem 0). The first set of constraints in 0b), 0c) and 0d) is nonconvex because of the biinear term on the right side. To make the probem become convex, we approximate them through a first order Tayor series around u c, αc as used in [] since u and α are a nonnegative, i.e., φu, α ) = u α = 0.5 u + α ) u α ) ) = 0.5u + α ) 0.5 u c α c ) + u c α c ) u u c α + α c )), 4) in which the superscript c denotes the c-th iteration in the foowing proposed iteration agorithm. After this operation, the first set of constraints in 0b) becomes convex in the variabes of interest. Simiary, the biinear products on the right side of the first set of constraints 0c) and 0d) are respectivey expressed as and ψ v, β ) = v β = 0.5 v + β ) v β ) ) = 0.5v + β ) 0.5 v c β c ) + v c β c ) v v c β + β c )) ϕ t, β ) = t β = 0.5 t + β ) t β ) ) = 0.5t + β ) 0.5 t c β c ) + t c β c ) t t c β + β c )). 5) 6) After these transformations above, 0) becomes a quadraticay constrained convex optimization probem, which is abe to be soved by concave-convex procedure []. However, this method has a drawback that it needs a feasibe point as initiaization [8], which is difficut to obtain in genera. SDR is a powerfu, computationay efficient approximation technique for quadraticay constrained optimization probem and widey used in the area of signa processing and communication [9]- [0]. Therefore, SDR approach is considered to sove 0). First, probem 0) is rewritten as the semidefinite positive SDP) form, as shown in 7) by reaxing rank constraints on rank W ) =, L, where W = w w H, after some 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 5 basic operations and define H,k = h,k h H,k, k {n, f}. max {{W } L,α,β,u,v,t,r} r = { tr H,n W ) γ,n φu, α ), L, s.t. tr H,n W j ) u rσ, L, \ tr H,f W ) γ,f +γ,f ψ v, β ), L, tr H,f W j ) v rσ, L, tr H,n W ) γ,f +γ,f ϕ t, β ), L, tr H,n W j ) t rσ, L, tr W ) =, L, W 0, 0e). 7a) 7b) 7c) 7d) 7e) 7f) 7g) We now concude probem 7) is convex and can be effectivey soved by convex optimization sover such as SeDuMi [], which uses an interior point agorithm to efficienty find an optimum soution to the probem. Based on the derivation and anaysis above, an iterative optimization agorithm is summarized as Agorithm referred as Joint Beamforming and Power Aocation Agorithm JBPA). Convergence Anaysis: From Agorithm, it wi be readiy seen that the obtained optima soution at the c-th iteration are aso feasibe for the probem at the iteration c +, which is due to the approximation 4) 6) []. This impies that Agorithm returns a non-decreasing sequence of objective vaues, i.e., r c+ r c. Moreover, according to the definition of the feasibe set w, L, α, β, u, v, t, r in 0), these optimization variabes are convex and compact. This makes the agorithm converges to a finite vaue []. Foowing the proof of the Theorem in [], one can prove that the proposed agorithm converges to a Karush-Kuhn-Tucker point of probem 9). Due to the reaxation, the soutions of 7), denoted as W, L may not be rank one. This is because the convex) feasibe set of probem referred the rank-one reexation probem ) is a superset of the nonconvex) feasibe set of probem referred the rank-one kept probem). In addition, the optimum objective vaue of probem 7) is merey a ower bound on the transmitted power required by the rank-one transmit beamforming scheme [3]. If W, L contain ony rank- matrices, then the principa component of each W is the optimum beamforming vector for the -th custer. Otherwise, we use the randomization technique as appied in [3] and [4] to generate candidate soution of power aocation coefficients and beamforming vectors from W, L and choose the one that yieds the minimum transmit power soution among a feasibe ones. A. Randomization Agorithm In this subsection, we deveop a randomization agorithm to obtain an approximate soution to the origina probem from the soution to its reaxed version if rankw )>. The randomization agorithm is described as foows. Agorithm Joint Beamforming and Power Aocation Agorithm JBPA) : Initiaization: Set iteration index c = 0, maximum iteration number C max, and generate initia points u c, v c, t c, α c, β c, r c ). Error toerance ε. : repeat 3: c = c + 4: Sove 7) with u c, v c, t c, α c, β c, r c ) and obtain the soution {W }L = and u, v, t, α, β, r ). 5: Update u c = u, v c = v, t c = t, α c = α,β c = β and r c = r. 6: unti r r c ε or c C max. Simiar to [4], the eigen-decomposition of each optima matrix is first cacuated as W = U Σ U H and the i-th candidate beamforming vector for the -th custer is generated as w i = U Σ / v i, where U and Σ denote an unitary matrix of eigenvector and a diagona matrix of eigenvaues for the -th custer beamforming matrix, and the eements of v i are independent random variabes uniformy distributed on the unit circe in the ) compex pane. This ensure that ) w i Hw i = vi H Σ / U H U Σ / v i = tr ) Σ v i vi H = tr Σ ) = tr W ) = for any reaization of v i. Let a,k = h H,k wi, k {n, f} denote the signa power received at receiver k of the -th custer. Then the foowing probem emerges in converting candidate power aocation coefficients and beamforming vectors to a candidate soution of probem 9). max r i {λ,r i } λ ) a,n s.t. a,n + r i σ γ,n, L, \ λ a,f λ ) a,f + \ λ a,n λ ) a,n + \ a,f + r i σ γ,f, L, a,n + r i σ γ,f, L. 8a) 8b) 8c) 8d) The process can be repeated the randomization process to obtain a new candidate soution unti it reaches the predetermined maximum number I of randomizations. Note that a feasibe soution of 8) does not aways be achieved due to the random generated beamforming vector. If the particuar instance of probem 8) is infeasibe, discard the proposed set of candidate beamforming vectors; ese, record the set of beamforming vectors, the power aocation factors λ and the objective vaue. Finay, the best soution corresponding to the maximum r { r,, r I} from these candidate soutions is chosen. The randomization process above is different from [3] and [4], which have to mutipy corresponding scae coefficients to satisfy a types of constraints. In our probem, a constraints can be satisfied by controing power aocation coefficients. 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 6 B. Compexity Anaysis In Agorithm, SDP is soved in each iteration. The tota number of iterations are fixed and ony variabes are updated in each run of the agorithm. Therefore, we focus on the compexity anaysis of soving SDP optimization probem 7). As aforementioned, the SDP probem is soved by convex sover which uses interior point methods to obtain optimum soution. Probem 7) has L matrix variabes of size M M and consists of a inear objective function, 6L + ) L inear constraints and L positive-semidefinite constraints. Hence, the worst-case compexity of soving the SDP probem 7) using interior point methods wi take LM ) O og /ξ) iterations, and each iteration invoves at most O L 3 M 6 + 6L + ) L M ) arithmetic operations [3], where the parameter ξ denotes the soution accuracy at the agorithm s termination. For the randomization technique, probem 8) is a inear program with L + nonnegative variabes and 3L inear inequaity constraints. Hence, it wi take O L + og /ξ) ) iterations to obtain a ξ-optima soution of probem 8), and ) each iteration requires at most O L + ) 3 + 3L L + ) arithmetic operations [3]. Hence, if rankw )>, the overa compexity C JBPA equas that of soving probem 7) once and probem 8) I times; ese, C JBPA equas the compexity of soving probem 7). For comparison, we aso consider other schemes, such as power aocation coefficient is fixed or decided by channe gains of users. For instance, the power aocation coefficient is assigned to a constant vaue or set by using FTPC method as used in [8]. Then, the beamforming vectors are obtained by utiizing the proposed agorithm without the optimization variabe of power aocation coefficients. The comparison resuts are given in Section V. IV. ITERATIVE USER CLUSTERING AND JOINT BEAMFORMING AND POWER ALLOCATION A. User Custering Agorithm The SINR of each user is affected by intra-custer and intercuster interference. Efficient custering agorithm can reduce these interference to improve system performance as shown in [6]-[9], [6] and [7]. Hence, it is important for how to group users into which custers. Since the proposed user custering scheme in [6-9] and [6] cannot be directy appied to our probem, we propose an improved user custering agorithm IUCA) to further reduce tota power consumption in this section. The optimum user custering can be found by exhaustive search. However, the compexity is proportiona to L!, which cannot be affordabe if the number of custers is arge. To this end, a sub-optima with ow-compexity IUCA is proposed. Observed 5), 6) and 7), numerator and denominator incude h H,k w and h H,k w j, j L\, k {n, f} respectivey. Assume near users aocation has been done, since both near user and far user are served by one beam, the channe gain correation between the near user and far user seected shoud be as arge as possibe. In this way, the SINR of the seected far Agorithm Improved User Custering Agorithm IUCA) : Input: M, L, ς, θ, d, d,..., d L, h, h,..., h L and g, g,..., g L. : Output: G. 3: Sort users according to descending channe gain, i.e., h h... h L. Define A = {,..., L} as near users index set and B = {L +,..., L} as far users index set. Let C m, n) = gh m g n g m g n, and D m, n) = h m h n, m A, n B denote the vaue of channe gain correation and channe gain difference between near user m and far user n, respectivey. 4: Step. Obtain a the channe gain correations and differences between near users and far users 5: for m=:l do 6: for n=l+:l do 7: Cacuate C m, n) and D m, n). 8: end for 9: end for 0: Step. Seect a far user to each near user for custering : for m=:l do : if max {C m, :)} ς then 3: Obtain the candidate far users set F m) such that the channe gain correations between user m and each far user in F m) are no ess than ς, i.e., C m, j) ς, j F m). 4: ese if max {C m, :)} ς θ then 5: Obtain F m) as ine 3 such that C m, j) ς θ, j F m). 6: ese if max {C m, :)} ς θ then 7: Obtain F m) as ine 3 such that C m, j) ς θ, j F m). 8:. 9: ese if max {C m, :)} > 0 then 0: Obtain F m) as ine 3 such that C m, j) 0, j F m). : end if : Seect user q from F m) through maximizing the channe gain differences between user m and the user p in F m), i.e., q = arg max D m, p), p F m). Obtain G m) = {m, q}. 3: Set C m, :) = 0, C :, q) = 0, D m, :) = 0 and D :, q) = 0. 4: end for user wi be optimized since h H,f w and h H,f w j, j L\ become arger and smaer, respectivey. Note that the channe gain correation between two users refers to their Rayeigh fading gain correation in this paper. Moreover, according to [6] and [9], the maximum channe gain difference between near user and a candidate far users is considered as criterion to choose a far user for custering. Based on the above discussions, IUCA is performed by two steps. According to channe strength, we cassify users into two sets: near users set and far users set. The first step is to obtain a channe gain correations and differences between near users and far users. The second step is to seect a far 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 7 user from far users set for custering after the near users are assigned to different custers. Setting the channe gain correation metric ς, we can obtain a far users set F m), in which a user s channe gain correations with near user m are not ess than ς. If the assigned metric is too arge and a channe gain correations are ess than this metric, the set F m) is nu. In order to obtain non-nu set of far users, the metric becomes ς θ, where θ 0, ) is a step size. If F m) is sti nu, the metric shoud become smaer by reducing the step size again unti it is arger than zero to obtain non-nu far users set. Then, a far user is seected from F m) foowing a predefined criterion, e.g., choosing the user with the maximum difference of channe gain. After user m and the seected far user custering, the channe gain correations and differences associated with them are set to zero. Repeating the second step, we wi obtain a custers. Let G denotes the output custer set. The m-th eement of G is denoted by G m) = {m, n} which consisting of user m and user n. The detais of the proposed IUCA are described in Agorithm. B. The Proposed Agorithmic Framework Based on JBPA and IUCA, we next design an agorithmic framework to jointy and iterativey update user custering in IUCA, foowed by beamforming and power aocation in JBPA at each iteration. For generating user custers, in ine of Agorithm, the far user is chosen through maximizing the channe gain differences between it and a near user. This process has been appied in [6] and [9]. By adopting this singe criterion, custers are generated ony once in their works, i.e., ony one user partition is considered. However, we observe that this soution may not be optima, and cannot aways ead to good performance. Therefore, we provide diverse criteria to choose a far user, and provide more opportunities to search user custers which possiby can resut in better performance in power savings. Thus in our design, after attaining one user partition and performing JBPA, in the next iteration, we re-impement IUCA but adopts different criteria to generate custers. The proposed agorithmic framework expores the diversity in user seection, and has potentias to further improve power-saving performance. According to our observations, if the criterion becomes choosing the second argest difference of channe gains, i.e, the channe gain difference is the second argest between the seected far user and near user, it may ead to ess transmit power than the argest difference of channe gains. Hence, we use different criteria in each iteration in our agorithmic framework. For exampe, assume there are n candidate far users for custering. In the first iteration, the criterion is to use the argest channe gain difference. In the second iteration, the criterion can be the second argest channe gain difference, and so on. Due to more possibiity in custer seections, the ine of Agorithm is varying according to the criterion in each iteration. For a given user partition, Agorithm wi be carried out to obtain beamforming and power soution. At the ast iteration, we choose the joint soution of custer, beamforming and power with the minimum transmit power. The agorithmic Agorithm 3 Iterative User Custering and Joint Beamforming and Power Optimization Agorithmic Framework : Initiaization: Set iteration index b = 0. : repeat 3: b = b + 4: Step. Impement Agorithm for user custering, and the ine in Agorithm is varying according to different criteria as foows: 5: if b== then 6: The criterion is same as the ine in Agorithm. 7: ese if b== and card F m)) then 8: The criterion is the second argest channe gain difference between seected user q and user m. 9: ese if b==3 and card F m)) 3 then 0: The criterion is the third argest channe gain difference between seected user q and user m. :. : ese if b==b max and card F m)) B max then 3: The criterion is the B max -th argest channe gain difference between seected user q and user m. 4: ese 5: The criterion is same as the ine in Agorithm. 6: end if 7: Step. Obtain a user partition, and then impement Agorithm to attain beamforming and power soution. 8: unti b B max. 9: Choose the user custering, beamforming and power soution corresponding to minimum transmit power. TABLE I: SIMULATION PARAMETERS Parameter Vaue Ce radius 00 m Transmission bandwidth 4.3 MHz Path oss COST-3-HATA Shadowing Log-norma, 8 db standard Fading Rayeigh fat fading [9] Noise power -73 dbm/hz framework is presented in Agorithm 3, in which the maximum iteration number B max is set as max {card F m))}, m A, where card ) denotes the cardinaity of a set. V. NUMERICAL RESULTS In this section, we evauate the performance of the proposed soution of iterative user custering, and joint power and beamforming design for MISO-NOMA downink systems. Tabe I summarizes the key simuation parameters. Without other decarations, a users are dropped uniformy and randomy in the ce as considered in [8]. In genera, the SINRs threshod for the near user and the far user in a custers are uniform if the same moduation scheme is adopted for both users. Hence, we set γ = γ,n = γ,f, L in simuations. Fig. shows the impact of number of transmit antennas in three schemes: the proposed JBPA, and two existing schemes fixed power aocation and FTPC for comparison. For the fixed power aocation scheme, the power aocation coefficient 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 8 Tota transmit power w) 70 65 60 55 50 45 40 35 Fixed power aocation FTPC JBPA Sum Rate bps/hz) 6 5 4 3 JBPA HighCorr,λ=0.5 ZFBF HighCorr, λ=0.5 JBPA HighCorr,λ=0.08 ZFBF HighCorr, λ=0.08 JBPA Rand,λ=0.5 ZFBF Rand, λ=0.5 JBPA Rand,λ=0.08 ZFBF Rand, λ=0.08 30 5 0 5 6 7 8 9 0 The number of transmit antennas, M Fig. : Power consumption with respect to the number of transmit antennas for three schemes, L = 5, γ = 0.. λ, L is set to 0.8, whie the power aocation coefficient is cacuated according to far user s channe gain in each custer as used in [8] for FTPC scheme. The beamforming vector soutions of the fixed power aocation and FTPC schemes are obtained by using the proposed JBPA, in which the power aocation coefficient is no onger optimum variabe. The number of custer is set as L = 5. In this simuation, we assume that the user distances are fixed and the fast fading components of the channe vectors are averaged over 000 simuation runs with I = 000 randomizations. As shown in the figure, the JBPA scheme requires ess transmit power than other schemes for any number of antennas to satisfy a users QoS requirements. The reason is that the both power aocation and beamforming which affect the users SINRs are considered together for JBPA scheme, which makes the intracuster interference and inter-custer interference minimize. As expected, the consumed power decreases as the number of antennas increases due to spatiay diversity gains for the three schemes. As the number of transmit antennas increases, the tota transmit power differences among them decrease. This impies that beamforming pays an important roe on power performance when there are a number of transmit antennas. In order to further show the performance gain of JBPA, we aso give a comparison between ZFBF proposed in [6] and JBPA in Fig. to investigate power consumption under different achieving throughput. The number of antennas and custers are set as M = 5 and L = 5, respectivey. We consider two different channe gain correations scenarios: the channe gain correation between near user and far user in each custer is high correation the channe gain correation metric is no ess than 0.95) and the channe gain correation between near user and far user in each custer is random. Notice that an exhaustive search method is used for ZFBF approach to find the best pairs that can achieve the maximum sum rate for a given tota transmit power. on one hand, the power aocation When λ =0.8, more transmit power is aocated to far users according to 5). Consequenty, the SINRs of far and near user become comparabe so that fixed power aocation and JBPA can be compared fairy. This set vaue has been used in [5]. Sum rate of a near users bps/hz) 0 5 0 5 0 5 0 5 Tota Transmit Power db) 5 4.5 4 3.5 3.5.5 0.5 a) Tota transmit power versus sum rate JBPA HighCorr,λ=0.5 ZFBF HighCorr, λ=0.5 JBPA HighCorr,λ=0.08 ZFBF HighCorr, λ=0.08 0 0 5 0 5 0 5 Tota transmit power db) b) Tota transmit power vs sum rate of a near users Fig. : Performance comparison of JBPA and ZFBF, L = 5, M = 5. Objective of 7) 0.5 0.45 0.4 0.35 0.3 0.5 0. 0.5 custers 3 custers 4 custers 5 custers 0. 3 4 5 6 7 8 9 0 Iteration number Fig. 3: Convergence trajectory of Agorithm, M = 5, γ = 0.. coefficient is set as λ = 0.5 for a custers so that the rate to near users is arger than that to far users for ZFBF. on the other hand, the power aocation coefficient is set as λ = 0.08 such that both near user and far user in a custer achieve simiar data rate. After finding the best pairs and fixing power aocation coefficient for ZFBF, we obtain the SINRs of a 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 9 Tota transmit power w) 90 80 70 60 50 40 30 0 0 IUCA, ς =0.,θ=0. IUCA, ς =0.3,θ=0. IUCA, ς =0.5,θ=0. IUCA, ς =0.7,θ=0. IUCA, ς =0.7,θ=0.05 Custering agorithm in [3] Random user custering Agorithm 3,ς =0.5,θ=0. Exhaustive search 0 3 4 5 6 7 8 9 0 The number of custers, L Fig. 4: User custering agorithm with fixed antenna for different number of custers for Scenario, M = 0, γ = 0.. users. For comparison fairness, the obtained SINRs are set to corresponding users of the JBPA scheme. In Fig. a), when the channe gains are high correations in each custer, we can see that the sum rate of JBPA outperforms that of ZFBF when the tota transmit power is ess than 3 db and 6 db for λ = 0.5 and λ = 0.08, respectivey. Moreover, if the channe gain correations are random, the sum rate of JBPA significanty outperforms that of ZFBF at any transmit power for λ = 0.5 and λ = 0.08. Especiay, sum rate of JBPA is three times more than that of ZFBF at 6 db for λ = 0.08. In Fig. a), we aso note that the sum rate of ZFBF is superior to that of JBPA if the tota transmit power is arger than 3 db and 6 db for λ = 0.5 and λ = 0.08, respectivey. The reason is that the inter-custer interference is competey canceed for ZFBF, such that the sum rate of a near users for ZFBF outperforms that of a near users for JBPA as the tota transmit power is arger than a power threshod. This phenomenon is verified by Fig. b). The compexity of ZFBF is from the inversion for an M M matrix to obtain beamforming vectors for the given user custers [30]. In genera, the compexity of an M M matrix inversion is O M 3). The tota L! possibe user custers are required when using exhaustive method to search the best pairs for maximizing sum rate. Therefore, the overa compexity of ZFBF is OM 3 L!) which is exponentia in L. Assume the rank of beamforming matrix is equa to one after soving 7). The compexity of JBPA LM ) is O og /ε) O L 3 M 6 + 6L + ) L M ), where LM ) O og /ε) and O L 3 M 6 + 6L + ) L M ) denotes the tota number of iteration and compexity of each iteration, respectivey. The compexity of JBPA can be higher than that of ZFBF for sma M and L. For arger L, the compexity in ZFBF increases exponentiay, and JBPA is with poynomia-time compexity. Moreover, the beamforming vector and the power aocation can be obtained simutaneousy in JBPA. In addition, to evauate the convergence of the proposed JBPA, we consider a downink system with M = 5 antennas under different custers conditions. The error toerance is ε = 0 3. As shown in Fig. 3, one can see that Agorithm generates a non-decreasing objective of probem 7) and converges within 6 iterations for the two, three, four and five custers. The objective 7) for two custers is arger than that for others custers since it consumes much ess energy than others. We observe that the performance gain of the proposed user custering agorithm is infuenced by user density. In simuations, we consider two scenarios: Scenario, users are randomy ocated with uniform distribution; Scenario, users are densey depoyed. The second scenario can be referred to as the typica scenarios in hotspot with utra-dense user distribution. We first provide the simuation resuts of different user custering agorithm for Scenario in Fig. 4. This figure shows the effectiveness of proposed IUCA and Agorithm 3 for different number of custers. The number of transmit antennas is fixed and set as M = 0. The best pairs are found by exhaustive search. Due to its high computationa compexity as L increases, the optimum user custering is considered up to L = 5, and the fast fading components of the channe vectors are averaged over 00 simuation runs with I = 000 randomizations in this simuation. In order to investigate the impact of channe gain correations metric ς and step size θ, ς and θ are set to different vaues for IUCA. As shown in the figure, the tota transmit power increases as the number of custers increases since more users are supported. We can aso see that the required transmit power of the proposed IUCA approaches that of exhaustive search. We note that the transmit power of θ = 0.05 is amost same with that of θ = 0. for the same metric ς = 0.7. Moreover, for different number of custer, the required tota transmit power is neary same when ς= 0.3 0.7. Hence, the parameters ς= 0.5 and θ = 0. can meet the requirement of IUCA in genera. In the figure, we aso pot the curves of random user custering and user custering agorithm proposed in [3], where the strongest channe user and the weakest channe user are grouped into one custer, the second strongest channe user and the second weakest channe user are grouped into another custer, and so on. We can see that the required transmit power of random user custering is arger than other schemes due to its randomness, which resuts in higher interference of intra-custer and inter-custer. Since user custering scheme in [3] does not consider the correation of inter-user, it consumes more energy than our proposed IUCA even if ς= 0.. We aso see that Agorithm 3 achieves the best power-saving performance. This is because that Agorithm 3 adopts various criteria in user custering, iteration by iteration, as described in Agorithm 3. Compared with user custering agorithm in [3], about 0% performance gain can be obtained by using IUCA and Agorithm 3. The simuation resuts of different user custering agorithm for Scenario are given in Fig. 5. From the figure, compared with custering agorithm in [3], one can observe that power consumption can be reduced about 3% and 0% by using the proposed IUCA with ς=0.5, θ = 0. and Agorithm 3 for L = 0, respectivey. The reason is that user custering agorithm in [3] does not consider channe gain correations, whie the IUCA and Agorithm 3 do. The proposed IUCA 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.

This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008, IEEE Access 0 with ς=0., θ = 0. obtains ess power performance due to its considering ess channe gain correations. Agorithm 3 achieves the best performance since it not ony considers channe gain correations of among users, but aso takes into account channe gain differences of among them. Next, we use an exampe, as shown in Fig. 6, to revea the reason behind. We refer to the seection of user custering with the argest and the third argest channe gain differences as Criterion and Criterion. The Criterion has been utiized in [6] and [9]. As shown in the figure, the powersaving performance of Criterion and Criterion is superior to that of custering agorithm in [3]. Moreover, the power performance of Criterion outperforms that of Criterion. Hence, it expains the reason why our deveoped soution can achieve better performance, and aso verifies the necessity of considering diverse user custering. According to the simuation resuts of Figs. 4 6, we can concude that the user custering agorithm in [3] may achieve good performance for the cases of arger channe gain differences among users, such as Scenario. Note that there is the ony one custering strategy considered in [3]. No matter how the scenarios vary, a the user custers in their work are formed by foowing this singe criterion without optimization. As a consequence, this custering strategy may not aways ead to good performance. When the differences of channe gain are sma, such as Scenario, the performance gains of IUCA and Agorithm 3 become impressive compared with user custering agorithm in [3]. The reason is that diverse criteria are adopted in IUCA and Agorithm 3. Assume there are L custer in the MISO-NOMA system. Since the strongest channe user and the weakest channe user are grouped into one custer in user custering agorithm proposed in [3], it requires one oop to form a custers, and then JBPA is impemented. Its tota compexity is O L) +C JBPA, in which C JBP A is computationa compexity of JBPA given in Section III-B. According to procedure of IUCA, the tota compexity of IUCA is O L + L ) +C JBPA. Hence, IUCA has a itte higher compexity than custering agorithm in [3]. In Agorithm 3, B max iterations are required to cacuate the minimum transmit power and the compexity of each iteration is equa to that of IUCA. Thus, the tota compexity of Agorithm 3 is B max O L + L ) +C JBPA ). Obviousy, Agorithm 3 has the highest computationa compexity among them. However, it can achieve best power performance as shown in Figs. 4 6. In Agorithm 3, we proposed some simpe criteria for user custering. More sophisticated criteria may be provided in our proposed agorithmic framework to further improve system performance, which wi be eft to investigate in our future work. VI. CONCLUSION In this paper, we have considered iterative user custering, and joint optimization with beamforming design and power aocation for MISO-NOMA downink systems. A users are grouped into mutipe custers. Each custer consists of two users and is supported by one beamforming vector. For joint power aocation and beamforming optimization, we formuated the probem subject to users SINR requirements. Due to its Tota transmit power w) 50 45 40 35 30 5 0 5 Custering agorithm in [3] IUCA, ς =0.,θ=0. IUCA, ς =0.5,θ=0. Agorithm 3, ς =0.5,θ=0. 0 5 6 7 8 9 0 The number of custers, L Fig. 5: Comparison among custering agorithm in [3], IUCA and Agorithm 3 for different number of custers for Scenario. M = 0, γ = 0.. Tota transmit power w) 45 40 35 30 5 0 5 0 5 0 Custering agorithm in [3] Criterion, ς =0.,θ=0. Criterion, ς =0.5,θ=0. Criterion, ς =0.5,θ=0. 6 8 0 The number of custers, L Fig. 6: An exampe: comparison between different criteria and custering agorithm in [3] in user custering for Scenario. M = 0, γ = 0.. non-convexity, the optimization probem was further converted to an approximated convex probem by using first order Tayor series. Then, an iterative agorithm JBPA based on SDR technique was proposed to sove it. For user custering, based on channe gain correation and difference, a sub-optima with ow-compexity IUCA was aso proposed to further reduce energy consumption. Combining with JBPA and IUCA, an agorithmic framework is deveoped to iterativey and jointy reduce system power consumption. Numerica resuts showed that the proposed iterative agorithm required ess transmit power than that of power aocation and beamforming considered separatey, and the performance of the proposed scheme outperformed that of ZFBF. For the considered scenarios, the proposed iterative JBPA agorithm can converge after a few iterations. Simuation resuts aso showed that the performance of the proposed IUCA approached to that of exhaustive search and outperformed random user custering scheme and existing user custering approaches, which demonstrated the efficiency of the proposed IUCA. Moreover, compared to the singecriterion user custering, the proposed agorithmic framework has superior performance. 69-3536 c) 06 IEEE. Transations and content mining are permitted for academic research ony. Persona use is aso permitted, but repubication/redistribution requires IEEE permission. See http://www.ieee.org/pubications_standards/pubications/rights/index.htm for more information.