Joint Transmit and Receive Multi-user MIMO Decomposition Approach for the Downlink of Multi-user MIMO Systems

Joint ransmit and Receive ulti-user IO Decomposition Approach for the Downlin of ulti-user IO Systems Ruly Lai-U Choi, ichel. Ivrlač, Ross D. urch, and Josef A. Nosse Department of Electrical and Electronic Engineering ong ong University of Science & echnology Clear Water Bay, owloon, ong ong Emails: {eeluchoi, eermurch}@ee.ust.h Institute for Circuit heory and Signal Processing unich University of echnology 8090 unich, Germany Emails: {ivrlac, nosse}@nws.ei.tum.de Abstract In this paper, we propose a joint transmit and receive decomposition approach for the downlin of multi-user IO systems. his approach maes use of the assumption of part of the receiver structure in computing the solution for the transmit processing. With this assumption, this approach decomposes a multi-user IO downlin channel into multiple parallel independent single-user IO downlin channels. A new degree of freedom is obtained by maing the number of data streams variable. Sample simulation results are provided and these show the potential of the proposed scheme. I. INRODUCION Recently, the independence among user channels in a common medium has been recognized and been utilized to increase system capacity. he downlin multi-user IO decomposition was proposed independently in []-[4] for multi-user IO systems to increase system capacity. his decomposition of a multi-user IO system into several independent single-user IO systems is of significant practical importance, since any technique developed for singleuser IO systems can be employed without modification in a multi-user environment. owever, there are several drawbacs associated with this decomposition scheme. hese include ) the number of users is limited, ) the number of receive antennas is limited, 3) the decomposition efficiency is low with respect to the transmit power in some situation, and 4) the solution does not exist in some cases. otivated by the restrictions of the multi-user IO decomposition, in this paper, we propose a joint transmit and receive multi-user IO decomposition scheme, which is an extension of the previous multi-user IO decomposition scheme proposed in []-[4]. his scheme offers considerable improvement in all aspects discussed above. he ey properties of this proposed joint decomposition scheme include ) the number of users can be as large as the number of transmit antennas, ) the number of receive antennas is unlimited, 3) the decomposition efficiency is considerably improved, by allowing some interference to be removed by the receiver and therefore lifting the burden of the transmitter, and 4) the joint decomposition solution usually exists even in those cases that the previous multi-user IO decomposition solution does not. hese improvements are made possible, by assuming some signal processing at the receiver, for the design of the transmit processing. he assumption of the special signal processing at the receiver is only used in order to find the solution of the transmit signal processing. It should be emphasized that our proposed joint multi-user IO decomposition scheme does not require the receiver to cooperate with the transmitter. he transmitter can carry out all computations itself. Finally, note that by maing the number of data streams transmitted to each user variable, new degrees of freedom are made available, which can be used to further optimise the system performance. Simulation results are provided and these show the outstanding potential of the proposed approach. In particular, it outperforms the previous decomposition approach significantly and is fairly close to the optimal approach. II. SYSE ODEL AND BACGROUND We consider a multi-user IO system with transmit antennas and users as shown in Figure, where N antennas are located at the -th user or mobile station (S). At the base station (BS), the data are processed before transmission, which we refer to as transmit processing and which is our design target. Let b represent the L transmit data symbol vector for user, where L is the number of parallel data streams transmitted simultaneously for user (,..., his data symbol vector is passed through the transmit processing, which is characterized by the transmit matrix, a L matrix that taes in L nonzero values and outputs terms. Each of the output terms is transmitted by each of the transmit antennas. b b b Base Station (BS) O N R Rx N L S (User ) N R Rx N L S (User ) N R Rx N L S (User ) Figure. System configuration of a multi-io system We assume that the channel is flat fading and denote the IO channel to user as, which is a N matrix. Its ( i, j) -th element is the complex gain from the j-th transmit antenna at the BS to the i-th receive antenna at S. At the receiver of user, N receive antennas are used to receive the L data symbols and the received signals can be written by a vector of length N, which is given by y ib i + n where the noise n is an N vector, whose elements are i.i.d. zero mean complex Gaussian random variables with variance σ. Without loss of generality, we assume the n () 0-7803-7954-3/03/$7.00 003 IEEE. 409

elements of b (,..., ) are i.i.d. zero mean random variables and with unity variance. Also, it is assumed that the noise and data vectors are independent. Based on the configuration mentioned above, our goal is to design a proper transmit processing for the downlin of a multi-user IO system to increase system capacity. One approach is to use the decomposition scheme independently proposed in []-[4]. owever, it has a number of drawbacs as follows:. Number of users Assuming all user channel-matrices (,,..., ), have full ran, the largest number of users allowed must fulfill the inequalities i Ni + N for,,...,. In the special case of N N... N N the maximum number of users is upper bounded by + N. () N For N >, the number of users cannot exceed ( +) /, which is a serious limitation, that gets even worse with increasing number of receive antennas.. Number of receive antennas he number of receive antennas is also limited. From (), it follows, that N, if we want to be able to serve at least two users. 3. ransmission efficiency Depending on the channel, the realization of the U- IO decomposition by transmit signal processing may become inefficient with respect to transmit power. If the nullspace ) has large overlap with the joint null-space I (, i i ), excessive transmit power is needed to achieve the decomposition. 4. Existence of solution In the case null ( ) I, i ( i ), the decomposition does not exist at all. his happens for instance in a spatially semicorrelated channel [5], where the directions of departure waves are identical for all users. In this paper, we wish to devise a method that overcomes these limitations. he inspiration for our idea comes about by noting that this decomposition scheme does not use any assumption at the receivers and that the transmit processing is obtained by only considering the multi-user channel. he last two drawbacs discussed above are a consequence of the properties of the multi-user channel. It can be expected, that good improvement will be possible, if a way can be found to change the properties of the multi-user channel in order to mae it more friendly for decomposition. In following, we assume that each user employs a pre-receiver (front part of the receiver) to filter the signal according to the channel and the transmit processing. By denoting the pre-receiver weight for user as R, which is a N L matrix, the output signal after passing through the pre-receiver can be written as a vector of length L, which is given by yˆ R y. III. JOIN RANSI AND RECEIVE ULI-USER IO DECOPOSIION A. ransmitter and Receiver Structure We solve the problem in two steps. In the first step, we assume the pre-receiver filter matrices R are nown. By considering the pre-receiver being part of the channel, we can write the transmit weight according to the decomposition solution [] as V A, where V is chosen such that R i i V 0 for i and V has orthogonal columns. We can compute V by singular value decomposition R R R + R + ~ 0 0 0 V ~ [ ] [ Σ V U ] U. (3) herefore, we can obtain yˆ R b + R n R V A b + R n. (4) Note that the multi-user IO system has been decoupled interference-free into parallel single-user IO systems. Also, note that in general i V 0, for i, which means that some amount of multi-user interference is present at the input of the receivers. A close observation of (4) shows that we can thin of the equivalent single-user IO channel of user as V with the transmit and pre-receive processing represented by A and R, respectively. In the second step, assume that all the transmit weights (,..., ) are set up at the transmitter according to the decomposition structure described above, the output of the prereceivers given in (4) can be written as yˆ R F A b + R n (5) where F V is the effective channel from the receiver s point of view. Let us write the matrix A in the form A / Q P, where Q has orthonormal columns and P is non-negative definite diagonal matrix containing the values for transmit power for each data stream of user, and P trace( P ) is the transmit power assigned to user. he overall transmit power then is given by P P Without loss of mutual information, the channel can be decomposed into independent sub-channels by singular value decomposition of F.. [ ] Σ 0 Q F ' V R U 0 Σ'. (6) V' Since there are only L independent data streams to be transmitted to user, the receive weight R is chosen as the L left singular vectors corresponding to the L largest singular values of F. At the same time, Q is chosen as the L right-singular vectors corresponding to the L largest 0-7803-7954-3/03/$7.00 003 IEEE. 40

singular values, i.e., y ˆ Σ P/ b + n', where Σ is a L L diagonal matrix, containing the L dominant singular values of F, and n ' is a L dimensional noise vector, whose elements are i.i.d. zero-mean complex Gaussian entries with variance σ n, since R R I. ence, the noise is unchanged in power and spatially white. he diagonal entries of the transmit power matrix P can now be chosen, according to the waterfilling policy [6], based on the singular values from Σ and the total power P assigned to this user. Since the column vectors of R span the subspace of the arriving dedicated signals V A b R Σ P/ b, the receive matrix captures the complete dedicated signal for user. ogether with the property of unchanged noise, this means that the matrix R does not change mutual information. herefore, the channel capacity is the same, regardless if the matrix R is present or not. For sae of simplicity, we assume that all the users have the same number of receive antennas and the same number of data streams. hat is, N... N N and L... L L. he dimension of the null-space spanned by the column vectors of V is then given by d ran( V ) L ( ). In order to be able to transmit L data streams, we need to have d L. Furthermore since L N, the number of data streams can be given by L min{ N, }, (7) and the number of users, which is independent of the number of receive antennas. B. Joint Decomposition Algorithm Since the transmit weight V (,..., ) is a function of the pre-receiver weight R (,..., ) and the pre-receiver weight R (,..., ) on the other hand is a function of the transmit weight V (,..., ), we propose an iterative algorithm which computes both matrices jointly. Given a particular value of the number of data streams, L (,..., ), the iterative procedure can be given as follows. Procedure for Computing Joint Decomposition Inputs: All channel matrices (,,, he number of data streams L (,,, he transmit powers P (,,, Outputs: he transmit matrices (,,, Step : Calculate an initial virtual receiver weight R (,..., ) by applying (6) and assuming the transmit weight matrix V (,..., ) is identity. ( 0) ( 0) Step : Calculate the transmit weight V (,..., ) by ( 0) using (3) and R (,..., ( i+) Step 3: Compute the virtual receiver weight R (,..., ) as the L dominant left-singular vectors of the matrix (,..., ) by applying (6). (i) V ( i+) Step 4: Compute the transmit weight V (,..., ) by ( i+) using (3) and R (,... Step 5: Repeat Step 3 and Step 4 until the projector of the virtual receiver weight matrix converges. hat is, ( i+ ) ( i+ ) P P P < ε for all where R R F R F P R R R and. F is the squared Frobenius norm. In our simulation, we use ε 0.0% Step 6: Compute Q (,..., ) as the L (,..., ) dominant right singular vectors of (,..., ( i+) V Step 7: Compute the stream power distribution matrix P (,..., If mutual information is to be maximized this has to be done according to the waterfilling policy [6] based on the singular values ( i+) of V (,..., ) and the power P (,..., ) assigned for the users. Step 8: Finally, form the transmit weights ( i + ) / V Q P (,..., his algorithm computes the equilibrium solution of the processing. here is no co-operation among the tansmitter and the receivers. o be more specific, in Step 4 of the algorithm, the transmitter chooses its optimum transmit strategies after all receivers have decided upon their pre-receive weights. On the other hand, in Step 3 each receiver chooses its optimum receive weights after the transmitter has decided upon its transmit weights. he successive repetition of Steps 3 and 4 is looing for an equilibrium of this non-cooperative process. Note that Steps and provide initial values. One peculiarity of this algorithm is that the pre-receive weight matrix R can be replaced by R Q, with any invertible matrix Q, without changing mutual information. Since any R Q is equally suitable as the virtual receiver matrix, we use the projector onto the column space of R in Step 5, as all R Q have the same column space. Note that the number of data streams L transmitted to each user offers additional degrees of freedom, which can be used to optimize the system performance further. In our simulations, we choose L such as to maximize the minimum mutual information among the users. his scheme offers considerable improvement in all aspects discussed in Section II. his joint decomposition scheme has the following ey characteristics:. he number of users can be as large as the number of transmit antennas. From (7), we can see that the number of users, in which the equality holds when the number of data streams L L (,..., ).. he number of receive antennas is unlimited. his can be seen from (7). 0-7803-7954-3/03/$7.00 003 IEEE. 4

3. he transmission efficiency is considerably improved by allowing some interference to be removed by the receiver, and therefore lifting the burden of the transmitter. his can be justified from the performance improvement compared to the standard multi-user IO decomposition scheme. 4. he solution for the transmit processing usually exists even in the case null ( ) I ( ),, i i which maes the new scheme suitable for the spatial semi-correlated channel with same directions of departing waves for all users, as discussed in Section II. We justify this by providing numerical results in Section IV. Also, note that by maing the number of data streams transmitted to each user variable, new degrees of freedom are made available, which can be used to further optimize the system performance. It should be emphasized that the assumption of the pre-receiver structure is just a means to obtain the transmit processing. Each user can have its own receiver structure. We are not able to show the convergence of this algorithm theoretically. Further research needs to be carried out for discussing the behavior of this algorithm. o this point, all our simulation results however show that it is convergent. Nevertheless, this verifies the pratical potential of this algorithm. We refer to this scheme as Joint ransmit and Receive ulti-user IO Decomposition Scheme. IV. SIULAION RESULS he performance of the proposed scheme discussed above is investigated by computer simulation. In our simulation, we assume that the noise variance ( σ n ) is one. he flat fading multi-user IO channel, including the uncorrelated and spatially correlated models [5], is used. wo different total transmit powers, P σ n 5 db and 5 db, are investigated. We choose the capacity (cumulative probability density function) as our performance evaluation measure. Since the theoretical capacity of the downlin of a multi-user IO channel is still an open question, we define the capacity in our simulation as the minimum mutual information among the users, assuming single-user decoding. his is an important representative index to measure the capacity of the downlin in a multi-user system. Provided the noise and all transmitted signals are independent, the mutual information for a particular user, say user, is given by I ( b ; y, ) ( y, ) ( y b,, ). (8) Note that we represent (,..., ) as. If the noise n and all transmitted data b are zero mean and i.i.d. Gaussian distributed with variance σ n and unity as well, from (8) we can obtain the mutual information of user as I ( b; y, ). (9) log I + I + i i det σ n, i ence, the capacity we define can be expressed as C min{ I ( b ; y, ),,..., }. Due to the randomness of the channel matrix (,..., ), the capacity we define is a random variable, which can be described by its probability density function (pdf), (C), and therefore the capacity is defined as F C f C ( C) ( x) dx. C f 0 C Based on this performance evaluation measure, the Joint ransmit and Receive U-IO Decomposition Scheme (denoted by Joint U-IO Decomposition) is compared with the standard U-IO Decomposition Scheme proposed in []-[4] (denoted by U-IO Decomposition), the optimal scheme (denoted by ), and a time division scheme (denoted by DA). he optimal scheme is obtained by nonlinear optimization, which maximizes the minimum mutual information among the users. hat is, arg max min{ I ( ),, L, } s.t. trace( ) P (0) ( ) I ( b ; y, where I ) is a function of. his nonlinear optimization is solved numerically. A Sequential Quadratic Programming (SQP) method [7] is used. On the other hand, the time division scheme divides the multiple users in time and therefore, at any time instant there is only one user transmitting. For sae of comparison, the mutual information of user in this scheme is given by I ( b ; y, ) det I + σ log n I ( ; y,, () where is chosen to maximize b ) subject to the total average transmit power (i.e., trace( ) P ). he factor / in () is because of the time division that each user only has / of the time for transmission. he analytical solution for can be obtained in this scheme and it is chosen according to [8] to maximize the mutual information subject to the power constraint trace( ) P. hroughout this section, we consider a -user system with transmit antennas at the BS and N receive antennas at each S and we will refer to it as a ( N, N, L, N ) system. Also, we assume that the number of data streams is equal to L for each user. Finally, note that we select the number of data stream L L... L L for the two proposed example schemes by maximizing the defined capacity C. Figure to Figure 4 present the capacity for configurations 5 (3,3), 4 (,,) and 6 (,) in an uncorrealted channel, respectively. It can be seen that the proposed scheme outperforms the time division scheme significantly. Also, we can see that it has better performance than the standard U-IO decomposition scheme. As we expected, the proposed scheme is more flexible in that it wors well in all configurations, including those for which the standard U-IO decomposition scheme does not wor (e.g., the 4 (,,) configuration). A close observation of the performance in the configuration 5 (3,3) shows that the performance of the proposed scheme is much better than that of the standard U-IO 0-7803-7954-3/03/$7.00 003 IEEE. 4

decomposition scheme. his is because the standard U- IO decomposition scheme becomes inefficient with respect to transmit power as discussed in Section II when the number of transmit antennas is not large enough. When there are enough transmit antennas (e.g., 6 in Figure 4), it can be observed that the standard U-IO decomposition scheme performs close to the proposed scheme. herefore, it is good to use the standard U-IO decomposition in such a case because it has lower complexity. When the channel is correlated, the standard U-IO decomposition scheme also becomes inefficient with respect to transmit power as discussed in Section II. his can be observed in Figure 5, where a 6 (,) system in a highly spatially correlated channel with one departure wave with 5 o of angle spread is investigated. We can see that the proposed scheme always outperforms the standard U-IO decomposition scheme. his is because the null space ) may have large overlap with the joint null-space I (, i i ) and this reduces the efficiency of the standard decomposition scheme in terms of transmit power. he phenomenon of the small capacity at low outage probability in those schemes that separate users by space, is caused by the similar null spaces ) and I i i (, i ). When the null space ) is similar to the joint null space I (, i i ) or ) (, users are difficult to be separated by I, i i ) 0 0 5x(3,3), /σ n [5,5]dB space. In reality, one has to avoid such situation by using user clustering that clusters those users that can be well separated by space to transmit simultaneously. REFERENCES [] R. L. Choi, and R. D. urch, "A transmit pre-processing technique for multi-user IO systems: a decomposition approach", o appear in IEEE ransactions on Wireless Communications. [] R. L. Choi, and R. D. urch, "A downlin decomposition transmit preprocessing technique for multi-user IO systems", in Proc. IS obile & Wireless elecommunications Summit 00, hessalonii, Greece, June, 00. [3] Q.. Spencer, and. aardt, Capacity and downlin transmission algorithms for a multi-user IO channel, in Proc. 36th Asilomar Conf. Signals, Systems, and Computers, Pacific Grove, CA, Nov. 00. [4]. Rim, ulti-user downlin beamforming with multiple transmit and receive antennas, Electronics Letters, vol. 38, pp 75-76, Dec. 00. [5].. Ivrlac, W. Utschic and J. A. Nosse, Correlated Fading in IO Systems, IEEE Journal on Selected Areas in Communications, (in press). [6] R. G. Gallagher, Information heory and Reliable Communications, Wiley, 968 [7] R.. Brayton, S. W. Director, G. D. achtel, and L.Vidigal, "A New Algorithm for Statistical Circuit Design Based on Quasi-Newton ethods and Function Splitting," IEEE rans. Circuits and Systems, Vol. CAS-6, pp. 784-794, Sept. 979. [8] E. elatar, Capacity of multi-antenna Gaussian channels, European ransactions on elecommunication (the original version appeared in echnical Report A& Bell Labs, 995), vol. 0, no. 6, pp. 585-595, 999. 0 0 4x(,,), /σ n [5,5]dB 0 /σ n 5dB /σ n 5dB 0 /σ n 5dB /σ n 5dB DA U IO Decomposition Joint U IO Decomposition 0 0 4 6 8 0 DA Joint U IO Decomposition 0 0 3 4 5 6 7 8 0 0 6x(,), /σ n [5,5]dB 0 0 6x(,), departure paths with 5 o angle spread, /σ n [5,5]dB /σ n 5dB /σ n 5dB 0 /σ n 5dB /σ n 5dB 0 DA U IO Decomposition Joint U IO Decomposition 0 0 4 6 8 0 DA U IO Decomposition Joint U IO Decomposition 0 0 3 4 5 6 7 8 Figure to Figure 5: Upper Left - Figure ; Upper Right Figure 3; Lower Left Figure 4; Lower Right Figure 5 0-7803-7954-3/03/$7.00 003 IEEE. 43