I. INTRODUCTION
MASSIVE MIMO [1] is a key enabling technique to achieve the large throughput required by current 5G cellular networks. In the massive MIMO concept, the base station (BS) is equipped with many more antennas than the number strictly needed to communicate with the users in its cell. Thanks to this overabundance of antennas, users communicating in the same time-frequency resources can be distinguished by means of simple linear filter techniques, such as minimum mean-square error (MMSE) in the uplink, or maximum ratio transmission (MRT) [2] in the downlink. Such filters require channel state information (CSI) for all links, in order to behave properly. CSI is typically obtained through the transmission of pilots, which are inserted within the frame at regular time intervals, depending on the channel coherence time. Thus, in order to achieve the performance promised by massive MIMO, obtaining a good CSI estimation becomes of crucial importance.
In real systems, the channel seen at the receiver is timevarying not only because of the presence of mobility, either of the user equipments (UEs) or of the surrounding environment (e.g., scatterers, reflectors, etc.), but also because of hardware nonidealities. One of the most important impairments of this type is phase noise (PN) [3]. PN arises from instability in the oscillators and typically assumes the form of a multiplicative noise, whose effect in massive MIMO systems can become particularly severe, substantially harming the performance of the receiver filters. While, in principle, pilot transmission allows the estimation of actual PN, these processes usually have a much faster time evolution than other forms of channel variations. Thus, between two pilot transmissions, the estimated PN becomes rapidly obsolete, i.e.,information aging takes place.
In this paper, we deal with the problem of reducing the impact of PN in a single-cell massive MIMO uplink. In particular, we introduce a receiver that is able to approximate joint PN detection and demodulation by iterating between a phase detector and a demodulator/decoder. The entire receiver is based on the expectation-maximization (EM) algorithm. The receiver was originally introduced in [4] for a non-massive MIMO line-of-sight (LoS) system, where the proposed structure was shown to achieve excellent performance.
A. Related Work
Several papers from the literature have faced the problem of PN in massive MIMO systems. A major distinction can be made between papers that deal with single-carrier transmission schemes and papers that deal with OFDM.
For massive MIMO-OFDM systems, the problem of PN is even more relevant than in single-carrier systems, as PN causes the loss of orthogonality of the subcarriers, giving rise to inter-carrier interference (ICI). Apart from the large bulk of literature regarding the impact of PN on non-massive MIMO-OFDM systems, the work in [5] specifically considers the uplink of a massive MIMO-OFDM system affected by PN. Two extreme cases are considered, the synchronous case where a single oscillator is feeding all the M BS antennas, and the asynchronous case where each BS antenna is fed by a different oscillator. The ergodic capacity in the asymptotic limit M → ∞ is computed and a Kalman filter is proposed to track the PN. Under a similar scenario, in [6] the authors derive lower bounds to the ergodic capacity for a finite value of M. In reference [7], a more composite PN model is introduced, in which the phase-locked loop (PLL) structure is taken into account to derive the PN statistics. It is shown that, while independent PN at each BS antenna is averaged out, a common PN component can give a substantial performance loss. In [8], the downlink of a massive MIMO-OFDM system is considered and the impact of PN on the performance of the zero-forcing and MRT precoders is characterized. The effect of imperfect CSI is also investigated. In [9], an iterative algorithm, based on variational EM, is proposed for compensation of PN in the uplink of massive MIMO-OFDM systems. Such algorithm performs joint channel and PN estimation. In a very recent paper [10], an iterative PN and channel estimation algorithm for mmWave massive MIMO-OFDM systems is presented, where PN is estimated with pattern search and channel estimation is based on compressed sensing.
For single-carrier systems, the authors in [11] consider the uplink of frequency-selective massive MIMO systems, with single-antenna users transmitting to the BS. Again, synchronous and asynchronous cases are considered for the PN at the BS. A time-reversal maximal-ratio combining is proposed, and its performance is analyzed. Asymptotically in M, the array gain is shown to be [TeX:] $$O(\sqrt{M})$$, as in the case without PN. In the non-asymptotic case, a dramatic worsening of performance due to PN is shown, with a further loss in the asynchronous case. The performance loss is also caused by the very simple receiver considered. The downlink of a frequencyflat massive MIMO system is studied in [12]. In the downlink, it is unlikely to introduce a PN estimation/tracking algorithm, due to the limited computational complexity of the UEs. In [12], the PN model at the BS is more general, supposing that an oscillator can feed several BS antennas. The effect of PN on the per-user SINR is evaluated, depending on the different linear precoder, i.e., zero-forcing (ZF), regularized ZF, or matched filter. It is shown that the performance loss increases with an increasing number of oscillators. In [13] a precoding scheme based on ZF with PN suppression is proposed for the downlink of a massive MIMO with time division duplexing (TDD). Due to the reciprocity property, the estimated uplink channel (obtained with a simple PN estimation scheme) is used in the downlink to form the ZF precoder. The work in [14] introduces an algorithm based on approximate Bayesian interference for symbol error reduction of the PN-impaired uplink of massive MIMO systems.
More recently, reference [15] investigates PN-affected massive MIMO systems in the mmWave band. The PN model is similar to the one considered in [7], with a common reference low-frequency oscillator that feeds all BS antennas, and a bank of PLLs, each feeding a subset of BS antennas, in order to raise the carrier frequency to the mmWave band. Thus, there are two contributions to PN, one from the reference oscillator, common to all antennas, the other from the PLL voltagecontrolled oscillator (VCO), which is independent for different antenna subsets. The first PN contribution is proven to be lowfrequency and thus easily estimated, while the second term has a faster dynamics and thus turns out to be more problematic. The performance is characterized for MMSE filtering, when QPSK symbols are transmitted, yielding assessment of the PN impact on the output BER. In [16], the focus is on hybrid analog-digital schemes for mmWave massive MIMO, for which the sensitivity to PN and channel estimation errors is quantitatively analyzed. Hybrid analog-digital schemes are a way to reduce the number of RF chains, which is especially important at the mmWave band. Moreover, a comparison between the hybrid and the fully-digital scheme in terms of robustness with respect to PN is performed. In [17] the impact of PN on beamforming performance for mmWave massive MIMO systems is investigated.
On the other hand, a very recent flavor that is capturing the attention of the research community is related to cellfree massive MIMO systems. The work in [18] analyzes the impact of PN on the performance of cell-free massive MIMO systems, both for their uplink and their downlink, showing that end users suffer more than access points (APs) from PN. PN-aware power control is also studied. In [19], the spectral efficiency of a frequency-selective cell-free massive MIMO system with PN and imperfect channel estimation is studied, specifically when two linear low-complexity decoders, namely, time-reversal maximum-ratio combining and timereversal large-scale fading decoding, are employed. Finally, it is worth noting that massive MIMO systems affected by PN have been studied in several particular applications, such as compressive channel estimation [20], PHY-layer authentication [21], or joint channel estimation and localization [22], among others.
In this paper, we consider the uplink of a single-carrier frequency-flat massive MIMO system. Thus, apart from the band, we are in a setup close to that of [15]. We consider independent PN processes for given antennas subsets, i.e., we neglect a common reference oscillator, whose PN contribution is proven in [15] to be easily estimated. The novelty of this work relies in a study of the performance improvement determined by an iterative receiver, which represents a viable, suboptimal implementation of a joint phase detector and demodulator/decoder. It is to be noted that some preliminary simulation results obtained with the receiver proposed in this paper were shown in [23] to assess the effect of CSI aging.
Summarizing, the main contributions of this paper are as follows:
· We describe in detail the proposed receiver under a general setting where each oscillator at the transmitter and at the receiver can feed an arbitrary number of antennas;
· We derive the Bayesian Cram´er-Rao Bound (BCRB) for this general setting, yielding a benchmark for the performance in terms of the mean-square error (MSE) of the proposed receiver;
· Through numerical simulations, we analyze the receiver performance both in terms of MSE and in terms of bit error rate (BER) in several realistic scenarios, including imperfect CSI;
· We conclude with several design rules both for the receiver and for the system, in order to achieve a given target performance.
With respect to [4], [23] and [24], the new contributions are:
· The model in [4] has been generalized, since in this paper a given oscillator can feed more antennas; moreover, we consider here a massive MIMO system, possibly in non- LoS conditions, unlike [4], where the channel matrix was assumed to be unitary.
· The step size of the steepest-descent algorithm used for phase detection has been optimized, whereas the architecture in [4] used a fixed step size.
· The effect of fading and of imperfect channel state information is taken into account and investigated.
· The performance of the proposed receiver is compared with that of the solution proposed in [14], adapted to our scenario of coded transmission and Wiener PN model.
· The Cram´er-Rao bound is extended to the more general system model of this paper, while [24] contains a derivation of the Cram´er-Rao bound for the scenario in [4].
· Many more simulation results are shown, in comparison with [23], which does not give the details of the algorithm. A thorough analysis of the algorithm complexity is given for the first time.
The remaining of this paper is structured as follows. In Section II, we describe the system under consideration, with Subsection II-B devoted to the description of the pilot transmission scheme and of the assumptions made on channel estimation. In Section III, the EM-based receiver is described, with particular emphasis on the phase estimator (Subsection III-A) and a brief description of the MIMO demodulator (Subsection III-B). Section IV derives the Bayesian Cram´er- Rao bound. Section V shows the simulation results, both in terms of MSE (Subsection V-A), BER (Subsection V-B), and average number of receiver iterations and steepest-descent steps (Subsection V-C), and it looks as well into the algorithm complexity and latency (Subsection V-D). Finally, Section VI draws some conclusions.
II. SYSTEM DESCRIPTION
A. Overall Channel Model
The model we describe below, while general in nature, is specially suitable for the uplink of a wireless network in which K users (typically, with one or few transmit antennas each) transmit to a base station equipped with many antennas and able to sustain a relevant computational burden. We assume that there is a power-control mechanism so that all users are received with the same power. Besides being true for many real systems, such assumption constitutes a worst case for phase detection, as unequal received power could arguably be of help in that respect. This situation will (implicitly or explicitly, where corresponding) constitute our specific context throughout the article.
Consider an [TeX:] $$N_t \times N_r$$ massive MIMO channel with [TeX:] $$O_t$$ oscillators at the transmitter (corresponding to all users) and [TeX:] $$O_r$$ oscillators at the receiver. Each transmit-side oscillator feeds [TeX:] $$N_{o, t}=N_t / O_t$$ antennas, while similarly [TeX:] $$N_{o, r}=N_r / O_r$$ receive antennas are fed by the same oscillator1, as shown in Fig. 1, where the general model for the MIMO channel and oscillator setup is detailed. Different oscillators introduce independent PN processes. The input-output relationship for the MIMO setup at time n = 1, 2, · · · is given by
1 Notice that an even more general scenario in which different oscillators may feed different numbers of antennas is compatible with our proposal. However, we chose not to pursue this case to avoid an unnecessary notation burden.
Overall scheme of the MIMO channel and oscillator/antenna setup for the massive MIMO system.
where
· [TeX:] $$\mathbf{H}[n]$$ is the [TeX:] $$N_r \times N_t$$ channel matrix at time n, whose statistical characterization will be described in more detail below and in Subsection V-A;
· [TeX:] $$\mathbf{\Phi}_T[n]=\operatorname{diag}\left(e^{j \phi_1[n]}, \cdots, e^{j \phi_{O_t}[n]}\right) \otimes \mathbf{I}_{N_{o, t}}$$ and, similarly, [TeX:] $$\mathbf{\Phi}_R[n]=\operatorname{diag}\left(e^{j \phi_{O_t+1}[n]}, \cdots, e^{j \phi_{O_t+O_r}[n]}\right) \otimes \mathbf{I}_{N_{o, r}}$$ are the diagonal matrices of transmit and receive phasenoise coefficients at time n, respectively, assumed to be unknown at both sides; notice that we have assumed that oscillator [TeX:] $$i, i=1, \cdots, O_t$$ feeds transmit antennas [TeX:] $$(i-1) N_{o, t}+j, j=1, \cdots, N_{o, t},$$ while oscillator [TeX:] $$O_t+i,i=1, \cdots, O_r$$ feeds receive antennas [TeX:] $$(i-1) N_{o, r}+j, j=1, \cdots, N_{o, r}$$;
· [TeX:] $$\mathrm{x}[n]$$ is the column vector of the [TeX:] $$N_t$$ modulated symbols transmitted at time n, each of them having average energy [TeX:] $$E_s$$;
· [TeX:] $$\mathrm{y}[n]$$ is the column vector of the [TeX:] $$N_r$$ received samples at time n;
· [TeX:] $$\mathrm{z}[n]$$ is a size-[TeX:] $$N_r$$ vector of zero-mean, circularly-invariant Gaussian-noise samples, with variance [TeX:] $$\sigma^2$$ per real dimension, which are assumed to be independent across time and for each receive antenna.
For the phase-noise samples, time dependency is kept into account by assuming Wiener phase-noise processes:
where [TeX:] $$\phi_1[0], \cdots, \phi_{O_r+O_t}[0]$$ are independent and uniformly distributed in [TeX:] $$[0,2 \pi) \text {, and } w_i[n], \cdots, w_{O_r+O_t}[n]$$ are independent zero-mean white Gaussian processes with power [TeX:] $$\rho^2$$ (all processes have the same power).
Before describing the proposed detector, let us notice that each tap of the MIMO channel is affected by the sum of one transmit and one receive phase-noise process. We define a sum phase-noise process as:
Unlike the [TeX:] $$\phi_i[n]$$ processes, which will be called atomic hereafter, two sum phase-noise processes are correlated if they share one index. Moreover, they can all be written as linear functions of [TeX:] $$\phi_{i 1}[n], i=1, \cdots, O_t \text { and } \phi_{1 i^{\prime}}[n], i^{\prime}=2, \cdots, O_r,$$ as:
Thus, the whole set of [TeX:] $$O_r O_t$$ sum phase-noise processes is generated from a basis with [TeX:] $$O_r+O_t-1$$ elements. Although atomic phase-noise processes have a simpler statistical characterization, they are not observable, so that phase estimation must pass through sum phase-noise process estimation.
We define for future use the size-[TeX:] $$\left(O_t+O_r\right)$$ vector [TeX:] $$\phi[n],$$ whose i-th element is [TeX:] $$\phi_i[n], i=1, \cdots, O_r+O_t .$$ Analogously, we define the size-[TeX:] $$\left(O_t O_r\right)$$ vector [TeX:] $$\phi^{\text {sum }}[n] \text {, }$$ whose element [TeX:] $$(i-1) O_r+i^{\prime} \text { is } \phi_{i i^{\prime}}[n], i=1, \cdots, O_t, i^{\prime}=1, \cdots, O_r .$$
B. Pilot Transmission Scheme and Channel Estimation
Fig. 2 shows the structure of the frame assumed in this paper, with the diverse pilot and data transmission intervals, as described below.
Frame structure for the proposed massive MIMO transmission scheme. CP: channel estimation pilots. P: phase-noise estimation pilots.
The underlying hypothesis behind the channel model in (1) is that the channel matrix [TeX:] $$\mathbf{H}[n]$$ changes in time much more slowly than the PN. Thus, channel estimation can be performed less frequently than phase detection. In particular, we will suppose that there are two kinds of pilots in the system, with the first pilot type devoted to joint channel and phase estimation (as it is not possible, in principle, to distinguish the phase of unknown channel matrix entries from PN), and the second devoted only to phase detection.
The first type of pilots, called hereafter channel pilots, will be transmitted during [TeX:] $$N_t$$ consecutive channel uses, after which a frame of L channel uses begins. We suppose orthogonal channel pilots, namely, in the i-th channel use, [TeX:] $$i=1, \cdots, N_t,$$ all transmit antennas are switched off, except the i-th one. The average energy of channel pilots will be denoted [TeX:] $$E_C$$. The overall average energy of transmitted symbols will then be
During such frame, we will make the assumption that the channel matrix stays constant (corresponding to a nonfading or block-fading scenario), so that we will simply write [TeX:] $$\mathbf{H}$$ instead of [TeX:] $$\mathbf{H}[n]$$. Moreover, different frames are treated independently, so that, for our purposes, we can consider a single frame, where channel estimation is performed at the beginning, say at step 0. As a result of channel estimation, the receiver will obtain a noisy version of [TeX:] $$\mathbf{H}$$, i.e.,2
2 Here, we neglect PN variations during the channel estimation phase.
where [TeX:] $$\mathbf{Z}_C$$ is the channel estimation error, assumed to be composed of i.i.d. zero-mean Gaussian RV’s with variance [TeX:] $$\sigma^2 / E_C$$ per real dimension. In our model, we can always embed the initial phase-noise values into the channel matrix and assume that the phase-noise processes start from zero, or, equivalently, that we estimate differential processes [TeX:] $$\phi_i[n]-\phi_i[0], i=1, \cdots, O_r+O_t .$$ With this in mind, we can redefine the channel matrix to be [TeX:] $$\widetilde{\mathbf{H}}=\boldsymbol{\Phi}_R[0] \mathbf{H} \boldsymbol{\Phi}_T[0] .$$
The second type of pilots, used within the frame only to perform coarse phase detection, consists of blocks of [TeX:] $$O_t$$ pilots, transmitted every R data symbol periods. In a given block, the i-th pilot, [TeX:] $$i=1, \cdots, O_t,$$ is obtained by switching off all transmit antennas but the ones that are fed by the i-th transmit oscillator, which send symbols known at the receiver. Such phase pilots are used to obtain a rough estimate of sum phase-noise processes at pilot positions, after which MMSE filtering is performed to yield an estimate of atomic phase-noise processes. Finally, linear interpolation allows to obtain an initial phase estimate also for data positions, which will constitute a starting point for the EM-based receiver described in the next subsection. For more details on the EM initialization, see [4].
III. THE EM-BASED RECEIVER
We assume the information transmitted by the k-th user, [TeX:] $$k=1, \cdots, K,$$ is encoded with a channel encoder. The codeword is then mapped to a stream of QAM symbols, belonging to an M-size constellation [TeX:] $$\left\{\mathbf{x}^{(p)}\right\}_{p=1}^M,$$ and transmitted to the BS. Let us collect all symbols transmitted by all users into a frame of L symbol vectors [TeX:] $$\mathbf{X}=(\mathbf{x}[1], \cdots, \mathbf{x}[L]).$$ Each symbol vector is transmitted through the channel described by equation (1). The channel output for a frame is then collected in the matrix [TeX:] $$\mathbf{Y}=(\mathbf{y}[1], \cdots, \mathbf{y}[L]).$$
Let us define [TeX:] $$f_A(\Phi)$$ as the a priori distribution of [TeX:] $$\Phi=(\phi[1], \cdots, \phi[L]).$$ The optimal joint phase detector and demodulator obtains the maximum a posteriori estimate of [TeX:] $$\Phi \text{ and } \mathbf{X}$$, i.e.,
As the solution of the above optimization problem is way too complex to be computed as is, the proposed receiver approximates it by an iterative alternate optimization, as follows. Let us start from an initial symbol distribution in which all transmitted symbols are independently and uniformly distributed on the constellation alphabet, i.e.,
Then, at iteration [TeX:] $$l=1,2, \cdots,$$ perform the following computations:
where [TeX:] $$\mathbb{E}_{\mathbf{X}}^{(l)}$$ represents the average with respect to the prior distribution [TeX:] $$\operatorname{Pr}^{(0)}\{\mathbf{X}\} \text { for } l=0,$$ and with respect to the posterior distribution [TeX:] $$\operatorname{Pr}\left\{\mathbf{X} \mid \mathbf{Y}, \widehat{\Phi}^{(l)}\right\} \text { for } l \gt 0 .$$ The first step in the above alternate optimization is also called expectationmaximization (EM) algorithm and is a typical suboptimal approach to MAP (or ML) estimation of unknown parameters such as, in our case, the phase noise.
In practice, both steps in the above alternate maximization are approximated. The block diagram of the proposed receiver can be seen in Fig. 3. Starting from an initial pilot-based phase estimate [TeX:] $$\widehat{\Phi}^{(0)},$$ the receiver obtains at the l−th iteration the phase estimate [TeX:] $$\widehat{\Phi}^{(l)}.$$ The latter is then used in the MIMO demodulator to compute log-likelihood ratios (LLRs) for the transmitted symbols. These are converted to bit LLRs and input to the FEC decoders, one for each user. The decoders’ output is then mapped into estimated transmitted symbols [TeX:] $$\widehat{\mathbf{X}}^{(l)}$$ and fed back to the phase estimator for the subsequent (l + 1)−th iteration. In the first iteration, there is no prior information about the estimated received symbols. In Subsection III-A, we will explain in detail the phase estimation algorithm, while in Subsection III-B we briefly describe the MIMO demodulator.
Structure of the iterative receiver.
A. Phase Estimator
We open this subsection with some notation definition. We will denote as [TeX:] $$\widetilde{\mathbf{H}}_{i j}, i=1, \cdots, O_r, j=1, \cdots, O_t,$$ the [TeX:] $$N_{o, r} \times N_{o, t}$$ block of [TeX:] $$\widetilde{\mathbf{H}}$$ corresponding to oscillators j and [TeX:] $$O_t+i$$ at the transmitter and receiver side, respectively. Moreover, [TeX:] $$\widetilde{\mathbf{H}}_i, i=1, \cdots, O_t$$ will be the [TeX:] $$N_r \times N_{o, t}$$ matrix obtained by stacking all matrices [TeX:] $$\widetilde{\mathbf{H}}_{j i} \text { for } j=1, \cdots, O_r .$$ Finally, we will denote as [TeX:] $$\widetilde{\mathbf{H}}_{O_{t+i}}, i=1, \cdots, O_r \text { the } N_{o, r} \times N_t$$ matrix obtained by juxtaposing all matrices [TeX:] $$\widetilde{\mathbf{H}}_{i j} \text { for } j=1, \cdots, O_t .$$ With [TeX:] $$\widehat{\mathbf{H}}_{i j}, \widehat{\mathbf{H}}_i \text { and } \widehat{\mathbf{H}}_{O_t+i}$$ we will denote the corresponding estimated submatrices.
As already said, phase detection is based on an EM approach, where, at a given iteration, the expectation is taken with respect to the current distribution of the transmitted symbols. To ease notation, let us write
For the channel model in (1), apart from an inessential additive constant, (11) becomes:
where the two terms in the denominator account for both the additive white Gaussian noise and the noisy channel estimation. It is shown in [4] that there is no substantial performance loss if we substitute the average on [TeX:] $$\mathbf{X}$$ with the hard estimate of the transmitted symbols at iteration l. Thus, we make the following approximation:
where [TeX:] $$\widehat{\mathbf{x}}^{(l)}[n]$$ is the n-th column of [TeX:] $$\widehat{\mathbf{X}}^{(l)},$$ i.e., the estimate of [TeX:] $$\mathbf{x}[n]$$ at iteration l (which is computed at iteration l − 1 for l > 1, while is taken to be zero for l = 1).
In practice, as in [25], the maximization involved in (9) is approximated through the steepest-descent algorithm, and only the gradient of function [TeX:] $$h^{(l)}(\boldsymbol{\Phi})$$ is computed in the phase estimator. Let [TeX:] $$\widehat{\boldsymbol{\Phi}}_m^{(l)}$$ be the estimate of [TeX:] $$\widehat{\boldsymbol{\Phi}}^{(l)}$$ after m steepestdescent iterations (starting from [TeX:] $$\widehat{\boldsymbol{\Phi}}_0^{(l)}=\widehat{\boldsymbol{\Phi}}^{(l-1)},$$ i.e., the initial rough estimate based on phase pilots for l = 1, the estimate at the previous iteration for l > 1). Then.
where [TeX:] $$\lambda_m^{(l)}$$ is the m-th step size of the steepest-descent algorithm at iteration l. The derivative of [TeX:] $$h^{(l)}(\boldsymbol{\Phi})$$ with respect to [TeX:] $$\phi_i[n], i=1, \cdots, O_t,$$ is readily computed as:
with [TeX:] $$\widehat{\mathbf{x}}_i^{(l)}[n]$$ the length-[TeX:] $$N_{o, t}$$ subvector of [TeX:] $$\widehat{\mathbf{x}}^{(l)}[n]$$ corresponding to transmit oscillator i. Analogously, the derivative of [TeX:] $$h^{(l)}(\boldsymbol{\Phi})$$ with respect to [TeX:] $$\phi_{O_t+i}[n], i=1, \cdots, O_r,$$ results:
with [TeX:] $$\mathbf{y}_i[n]$$ the length-[TeX:] $$N_{o, r}$$ subvector of [TeX:] $$\mathbf{y}[n]$$ corresponding to receiver oscillator [TeX:] $$O_t+i.$$ Moreover, thanks to the Wiener model, we have (with a slight abuse of notation):
so that, provided that [TeX:] $$\rho^2 \ll 2 \pi$$ (a usually realistic approximation), we obtain [25]:
where [TeX:] $$k_n \text { and } k_{n-1}$$ are signed integers that minimize the moduli of the numerators.
1) Step size optimization and stopping rule: The performance of the steepest-descent algorithm depends crucially on the choice of the step size and on the number of performed steps. In this subsection, we clarify the design solutions we propose in this regard.
Regarding the step size, when it is fixed along the steps, it may imply a very slow convergence, if it is chosen too low, or even a lack of convergence to the optimum point, if it chosen too large. In order to improve the convergence of the steepestdescent algorithm, we have then dynamically chosen the step size [TeX:] $$\lambda_m^{(l)}$$ as follows.
· In the first step (m = 1), we adopt the backtracking line search approach, which has some convergence guarantees and it is popular also in deep-learning literature [26]. We start from a maximum candidate value for [TeX:] $$\lambda_1^{(l)},$$ and we progressively reduce it until Armijo’s condition is satisfied [27], i.e.
where [TeX:] $$g(\boldsymbol{\Phi})=h^{(l)}(\boldsymbol{\Phi})+\log f_A(\boldsymbol{\Phi}) \text { and } c$$ is a constant, set to c = 0.5 in our simulations. The progressive reduction of [TeX:] $$\lambda_1^{(l)}$$ from the initial value is performed with steps of 0.5.
· In the subsequent iterations [TeX:] $$(m \gt 1),$$ to set the step size we use the Barzilai-Borwein method [28], which is simpler than backtracking line search and has also globalconvergence guarantees under mild conditions, yielding:
The steepest-descent iterations are stopped whenever the relative increment in the objective function falls below a given threshold, i.e., whenever
Setting an appropriate value for θ implies meeting a tradeoff between convergence time and final error performance. A thorough study of the impact of the threshold on performance and complexity has been carried out and the results are presented in Sections V-A and V-C, respectively.
B. MIMO demodulator
At a given receiver iteration, the block labelled “MIMO demodulator” in Fig. 3 derives LLRs on symbols, based on the current phase-noise estimates, followed by a standard extraction of LLRs on coded bits. These bit LLRs are then input to the channel decoder. Due to the large size of the massive MIMO system, we propose a suboptimal MIMO demodulator, based on linear minimum mean-square error (LMMSE) filtering. Explicitly, at iteration l and channel use n, the symbol LLRs are computed on a per-symbol basis from the LMMSE filter output
where
Notice that the above filter expression takes into account the effect of imperfect CSI.
IV. BAYESIAN CRAMER-RAO BOUND
In this section, we compute the BCRB for the system described in the previous section, under the hypothesis that the transmitted symbols are known at the receiver and assuming perfect channel state information. This hypothesis suits well the case of the EM receiver, since, if convergence eventually occurs, the output of the channel decoder after several iterations provides the phase detector with (almost) perfect knowledge of the transmitted symbols. The subject of this subsection is the generalization of the BCRB computation in [24], [29], which considers the particular case of [TeX:] $$N_{o, t}=N_{o, r}=1 .$$
The BCRB allows to lower-bound the MSE between the unknown phase-noise sample sequence and its estimate performed by the phase detector at the receiver. Let [TeX:] $$\bar{\phi}^{\text {sum }}$$ be the size-[TeX:] $$O_t O_r L$$ column vector obtained by stacking all vectors [TeX:] $$\phi^{\text {sum }}[1], \cdots, \phi^{\text {sum }}[L] \text { and let } \widehat{\bar{\phi}}^{\text{sum}}$$ be any possible estimate of [TeX:] $$\bar{\phi}^{\text {sum }} .$$ The covariance matrix [TeX:] $$\boldsymbol{\Sigma}$$ of such estimate is given by
The BCRB then states that
where [TeX:] $$\mathbf{M}$$ is the Bayesian information matrix (BIM) defined by
Conditioning on [TeX:] $$\mathbf{X}$$ in the above definition corresponds to the hypothesis of known transmitted symbols. Considering the i−th element of vector [TeX:] $$\bar{\phi}^{\text {sum }},$$ the BCRB implies that:
i.e., the BCRB provides lower bounds to the MSE for every estimator of the sum phase-noise samples.
The result is summarized in the following proposition.
Proposition 1: For the channel model in (1), if [TeX:] $$\rho \ll 2 \pi,$$ the BIM M defined in (26) is given by
where [TeX:] $$\mathbf{M}_0=\mathbf{J}^{\top} \widetilde{\mathbf{M}}_0 \mathbf{J} \text { and } \mathbf{M}_1=\mathbf{J}^{\top} \widetilde{\mathbf{M}}_1 \mathbf{J}, \mathbf{J}$$ being the Jacobian of the transformation from sum processes to atomic processes at a given time n, [TeX:] $$\widetilde{\mathbf{M}}_1=-\frac{1}{\rho^2} \mathbf{I}_{O_t+O_r} \text { and } \widetilde{\mathbf{M}}_0=\widetilde{\mathbf{M}}_0^Y+\frac{2}{\rho^2} \mathbf{I}_{O_t+O_r},$$ and
where
and [TeX:] $$\Omega$$ is an [TeX:] $$O_r \times O_t$$ matrix whose (i, j) entry is given by
Proof: The formula is a rather straightforward generalization of those in [24], [29].
V. SIMULATION RESULTS
In this section, we present comparative simulation results for some meaningful test cases. These have been performed for a scenario under the following specific conditions:
· The modulation chosen is 64-QAM, with efficiency 6 bits per symbol.
· The FEC scheme chosen is the 5G NR LDPC code of type 2 and lift factor Z = 2, with a length N = 104 bits and a rate equal to 0.8 [30]. The number of LDPC decoding iterations is set to 50.
· The PN standard deviation is &rho = 0.2 radians and the PN process model simulated is the Wiener one, unless explicitly stated.
· In general, we consider a Rician-fading channel matrix given by
where [TeX:] $$\mathbf{H}_w$$ is composed by i.i.d. zero-mean circular complex Gaussian RV’s with unit variance, and [TeX:] $$\mathbf{H}_{\mathrm{LOS}}$$ is composed by unit-modulus entries with i.i.d. phases uniformly distributed in [TeX:] $$[0,2 \pi) .$$ Unless otherwise stated, we set [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB}$$, so that [TeX:] $$\mathbf{H}=\mathbf{H}_{\mathrm{LOS}} .$$
· In the case of the simulations run to get BER statistics, a genie-aided decoding stopping rule is activated, in order to speed up the process. No such rule is activated in the case of the simulations run to get MSE statistics.
· The frame length takes value L = 1086 symbols.
· The channel coherence time is L symbol periods (block fading).
· At the transmitter side [TeX:] $$K=O_t=16 .$$ Moreover, since we are considering an uplink massive MIMO for mobile communications, [TeX:] $$N_{o, t}$$ will be set to 2 (two antennas for each user).
· At the receiver side, we set [TeX:] $$N_r=64, \text { with } O_r=4$$ (four oscillators at the RX side, feeding [TeX:] $$N_{o, r}=16$$ antennas each).
· Unless otherwise stated, the maximum number of receiver iterations is set to 10.
· Each frame is built to contain 1000 LDPC codewords.
· The simulations to obtain the BER results have been obtained for up to 100 frames in error, with a maximum number of 106 simulated frames.
· The maximum number of iterations in the steepestdescent algorithm has been set to 300.
A. MSE Simulation Results
Figs. 4–6 show the MSE of the phase estimation and the corresponding BCRB for several setups. In Fig. 4, the curves show the MSE for different number of receiver iterations and several choices of the threshold θ to stop steepest-descent steps, as per (21). The red curve shows the MSE for [TeX:] $$\theta=10^{-6}$$ and 10 receiver iterations. The benefit of the iterative receiver can be observed by comparing such curve with the green one, which corresponds to the MSE performance after one receiver iteration. Since the green curve has a lower slope, the benefit increases with the SNR, reaching about 7 dB for MSE = [TeX:] $$2 \times 10^{-4}.$$ The magenta curve shows the MSE performance for [TeX:] $$\theta=10^{-7}$$ and 10 receiver iterations. The gain with respect to the red curve ranges from 2 to 5 dB but the slope of the two curves is about the same. However, as it will be evident from the results in Subsection V-B, when [TeX:] $$\theta=10^{-7}$$, the average total number of steepest-descent steps is larger than for [TeX:] $$\theta=10^{-6}$$, being for example 565 for the former case and 263 for the latter at [TeX:] $$E_s / N_0=9 \mathrm{~dB} .$$ At an SNR large enough, the curves for 10 iterations are essentially parallel to the BCRB, with an SNR gap at MSE = [TeX:] $$10^{-4}$$ of about 10 dB for [TeX:] $$\theta = 10^{-6}$$ and 5 dB for [TeX:] $$\theta = 10^{-7}.$$
MSE results for perfect CSI. (a) Threshold [TeX:] $$\theta = 10^{-6}$$ and 10 receiver iterations. (b) Threshold [TeX:] $$\theta = 10^{-6}$$ and 1 receiver iteration. (c) Threshold [TeX:] $$\theta = 10^{-7}$$ and 10 receiver iterations. (d) Threshold [TeX:] $$\theta = 10^{-4}$$ and 10 receiver iterations.
In the following, for all simulation results, we have set a threshold [TeX:] $$\theta=10^{-6} \text { and } 10$$ receiver iterations. In Fig. 5, we show the effect of imperfect channel CSI. The red curve shows the performance for perfect CSI (same as in Fig. 4). The other curves show the performance with imperfect CSI and different channel pilot power. In particular, taking into account that [TeX:] $$E_C$$ is the energy of channel pilots, the curves from top to bottom correspond to a ratio [TeX:] $$E_C / E_s$$ equal to 5, 10, 15 and 20 dB, respectively. As it can be seen, the green curve is heavily affected at low-to-medium SNR values by the channel uncertainty. The black curve, which is obtained by supposing a power for pilots 10 times the power for payload, loses 5-6 dB with respect to perfect CSI, for [TeX:] $$\mathrm{MSE}\lt 3 \times 10^{-4}.$$
MSE results for perfect and non-perfect CSI. In all the cases, [TeX:] $$\theta = 10^{-6}$$ and we have performed 10 receiver iterations. (a) Perfect CSI. (b) [TeX:] $$E_C / E_s=5 \mathrm{~dB} .$$ (c) [TeX:] $$E_C / E_s=10 \mathrm{~dB} .$$ (d) [TeX:] $$E_C / E_s=15 \mathrm{~dB} .$$ (e) [TeX:] $$E_C / E_s=20 \mathrm{~dB} .$$
Fig. 6 shows the effect of fading. The different curves are obtained with different values of the Rice factor, ranging from [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB} \text {, }$$ i.e., pure LOS, to [TeX:] $$K_{\text {Rice }}=-100 \mathrm{~dB} \text {, }$$ i.e., Rayleigh fading. As it can be seen, above an SNR of 10 dB, the MSE curves are only marginally affected by the fading intensity, with the MSE slightly increasing with decreasing [TeX:] $$K_{\text {Rice }}.$$ It is worth noting that the BCRB is virtually independent of [TeX:] $$K_{\text {Rice }},$$ since, due to the large MIMO size and to the expression of the BIM in (28), an averaging effect takes place.
MSE results for the fading channel. BCRB is the same for AWGN and for every [TeX:] $$K_{\text {rice }}$$ factor. In all the cases, [TeX:] $$\theta = 10^{-6}$$ and we have performed 10 receiver iterations (a) [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB} .$$ (b) [TeX:] $$K_{\text {Rice }}=10 \mathrm{~dB} .$$ (c) [TeX:] $$K_{\text {Rice }}=0 \mathrm{~dB} .$$ (d) [TeX:] $$K_{\text {Rice }}=-10 \mathrm{~dB} .$$ (e) [TeX:] $$K_{\text {Rice }}=-100 \mathrm{~dB} .$$
B. BER Simulation Results
In Figs. 7 to 10, we have depicted the BER results for various parameters, keeping the number of antennas and oscillators of the basic setup used to illustrate the MSE behavior. In order to compare the BER performance of our phase detector with other alternative from the literature, we have also simulated a phase detector based on the so-called generalized expectation-consistent signal recovery (GEC-SR), as proposed in reference [14]. It is to be noted that, whereas the original detector developed in [14] is intended for symbol detection at each channel use for the case of one antenna per oscillator, we have adapted the algorithm therein to take into account the variable number of antennas per oscillator and to estimate the actual phase increments of the PN processes along the entire frame, in order to perform phase compensation prior to MIMO demodulation. Therefore, we are actually replacing the “Phase Estimator” block in Fig. 3 with the GEC-SR-based alternative, while the rest of the iterative detector remains exactly the same. This ensures a fairer and more coherent comparison of the two phase-detection options.
In Fig. 7, we show the impact of PN on the system BER. First of all, the red curve depicts the BER for the case in which the PN has standard deviation ρ = 0.2 and there is no phase detection. As it can be seen, the curve gets flatter and flatter as the SNR increases, since for high SNR PN becomes the major and constant source of disturbance. If we employ a non-iterative receiver that performs phase estimation as described in the previous sections, the BER performance gets much better, as the black curve shows. However, the major improvement comes in with iterations, as shown by the lightblue curve, which corresponds to a maximum number of 10 iterations. Once again, the proposed receiver is able to virtually achieve the BER of the case without PN, with the same number of iterations. The receiver without PN takes advantage of the iterations in the MIMO demodulator and in the symbol-tobit demapper. We can compare these results with the ones obtained with the GEC-SR phase detector (orange curve). We can see how the GEC-SR algorithm exhibits a worse performance, only approaching the results of our proposal for high SNR. This is explained by the fact that the GEC-SR algorithm, as stated in [14], relies on approximations that get more accurate as SNR grows. In any case, these results confirm the goodness of the EM-based phase estimator.
BER results for different cases with or without added PN (cases PN vs. No PN), and several receiver strategies (including no phase detection and a variable maximum number of iterations). The PN process has a standard deviation ρ = 0.2 in all the PN-affected cases.
In Fig. 8, we can see the results with up to 10 receiver iterations, and several possibilities respecting the PN process. In light blue, we have the BER curve from Fig. 7, when ρ = 0.2. In dark blue, we have the results with the same setup, excepting a higher level in PN standard deviation, namely ρ = 1.0. As it can be seen, with this level of PN power, the system is still able to approach the PN-free performance, with a slight worsening respecting ρ = 0.2. In red, we have depicted the results obtained with the same setup, but in the case the PN process is generated from a mask, with an equivalent standard deviation of approximately 0.2. The mask is characterized by slopes −3 dBc/decade, −2 dBc/decade and 0 dBc/decade in the respective intervals [2, 100) KHz, [0.1, 1) MHz and [TeX:] $$\left[1,2 f_s\right)$$ MHz, where the sampling frequency is [TeX:] $$f_s=26 \mathrm{MHz},$$ and the reference level at 100 KHz is −133 dBc/Hz. The mask captures better the PN process from a realworld practical point of view, and we can see that, for the same level of PN power, the results are compatible with the ones generated with the theoretical Winener PN model. The slight improvement could be mainly due to the fact that the actual standard deviation is a bit lower than 0.2, along with the fact that the Wiener model represents a worst-case PN process model. These results show the robustness of the phase detector and receiver proposed, as it can cope with different levels of PN power and with a more realistic PN process model.
BER results with two different levels of PN power, along with the results obtained using PN samples produced from a mask instead of theWiener model. The standard deviation of the mask-based PN process is approximately 0.2.
From now on, we set ρ = 0.2 and up to 10 receiver iterations in the subsequent tests. In Fig. 9, we show the effect on BER of imperfect CSI. Like in Fig. 5, we consider different relative power levels for channel pilots. For [TeX:] $$\mathrm{BER}=10^{-4}, E_C / E_s=5 \mathrm{~dB}$$ (green curve) entails a 11-dB loss with respect to the perfect-CSI case, while [TeX:] $$E_C / E_s=10 \mathrm{~dB}$$ (black curve) reduces the loss to about 8 dB. When [TeX:] $$E_C / E_s=20 \mathrm{~dB}$$ (light-blue curve), the system is able to yield a limited loss of about 1.5 dB. These gaps are in good accordance with those seen for the MSE, as can be stated from Fig. 5.
BER results for different cases with perfect and non-perfect CSI. In all the cases, the maximum number of receiver iterations is 10, and the threshold for the steepest-descent algorithm is [TeX:] $$\theta = 10^{-6}.$$
In Fig. 10, we depict the results with perfect CSI but different [TeX:] $$K_{\text {Rice }}$$ values, displaying the same cases as in Fig. 6 for our proposed EM-based algorithm. As it can be verified, for a given performance, we have SNR differences very close to those shown in Fig. 6 respecting the MSE. For example, for a BER of [TeX:] $$10^{-4},$$ we can see the SNR gap between [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB}$$ and [TeX:] $$K_{\text {Rice }}=10 \mathrm{~dB}$$ is less than half a dB. The same happens if we consider [TeX:] $$K_{\text {Rice }}=-10 \mathrm{~dB}$$ and [TeX:] $$K_{\text {Rice }}=-100 \mathrm{~dB}$$ For [TeX:] $$K_{\text {Rice }}=0 \mathrm{~dB}$$ the SNR needed to attain said target BER value lies between 1 dB and 2 dB apart with respect to the cases with [TeX:] $$K_{\text {Rice }}=10 \mathrm{~dB}$$ and [TeX:] $$K_{\text {Rice }}=-10 \mathrm{~dB}$$ This is fully consistent with the differences in SNR we can appreciate in Fig. 6 for an MSE of [TeX:] $$10^{-3},$$ for example. We have also depicted the performance of the GEC-SR alternative in the case of the Rayleigh channel (i.e., [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB}$$). As it may be seen, the performance is somewhat poorer than with our proposal for the same fading level. As a contrast with the behavior in the pure AWGN channel (see Fig. 7), the GECSR curve does not converge towards the curve of the EMbased system when the signal-to-noise ratio increases. This means that the GEC-SR algorithm is not so proficient when dealing with Rayleigh block fading, regardless of the AWGN level. It is to be noted that the developments in [14] consider a fast-fading model, and channel coefficients that change at each channel use. This could account for the specially bad performance under Rayleigh block fading.
BER results for the fading channel, with [TeX:] $$K_{\text {Rice }}$$ ranging from 100 to −100 dB. In all the cases with the EM algorithm, the maximum number of receiver iterations is 10, and the threshold for the steepest-descent algorithm is [TeX:] $$\theta=10^{-6} .$$ The results obtained for [TeX:] $$K_{\text {Rice }}=-100 \mathrm{~dB}$$ with the above mentioned GEC-SR alternative are also shown for comparison.
C. Average Number of Receiver Iterations and Steepestdescent Steps
In Fig. 11, we represent the average number of receiver iterations performed as a function of the SNR, in the nominal case (no fading, perfect CSI), for different thresholds θ, compared with the GEC-SR alternative, and in Fig. 12 we depict the average total number of steepest-descent steps required. These figures allow understanding the computational requirements for the phase detection and correction process in each case, and may help in establishing important trade-offs. Fig. 11 shows how the threshold for the steepest-descent algorithm does not seem to affect much the average number of receiver iterations required to trigger the stopping rule as a function of the SNR. In general, we may distinguish three zones in the plots: for low SNR, the maximum number of receiver iterations (up to 10) is reached because the stopping rule is not triggered; for an SNR range between roughly 7 and 11 dB, we have a steep descent in the number of iterations, a fact which is coherent with the activation of the stopping rule and the progressively better performance of the phase-detection algorithm and the decoding step; finally, for large SNR (from 12 dB and on), we reach a floor where basically only one receiver iteration is performed, though from time to time up to two iterations would be performed in given cases and the average does not strictly fall to 1 iteration. It is to be noted that, in the transition zone, there is a slight difference as a function of the threshold θ (for example, at 9 dB): a case with a higher threshold (i.e., [TeX:] $$\theta=10^{-4}$$) requires an average number of iterations slightly higher than for lower values, and there is a consistent trend with decreasing θ. This is logical, given that a lower threshold will provide a lower MSE (see Fig. 4) and the stopping rule will be triggered earlier. Respecting the GEC-SR case, we can see there is a 1-dB shift to the right, which means this algorithm requires a larger number of receiver iterations at equal SNR to attain a worse BER performance: for example, at [TeX:] $$E_s / N_0=9 \mathrm{~dB},$$ up to 3 more iterations are required for GEC-SR, whereas the BER achieved is one order of magnitude higher, as seen in Fig. 7.
Respecting Fig. 12, we can see a picture of how the steepestdescent algorithm performs throughout the detection process. We depict the average total number of steepest-descent steps for the same setup as in Fig. 11, excepting the GEC-SR case, whose algorithm is completely different and is not directly comparable. We depict the total number of steepest-descent steps throughout all the receiver iterations, and it is an average respecting the number of frames processed. As it may be seen, and in contrast with Fig. 11, there is always a non vanishing difference among the different thresholds, since, for a lower θ, the algorithm will perform more steepest-descent steps regardless of the SNR level. In any case, the differences are higher for low SNR values (where a large amount of receiver iterations have to be performed), while, after the descent from 7 to 11 dB (which also reflects the trend identified in Fig. 11), each case seems to stabilize around a value which decreases with higher values of θ, as expected.
Though not shown, the BER results are basically the same for [TeX:] $$\theta=10^{-7} \text { and } \theta=10^{-6} \text {, }$$ while they slightly degrade for [TeX:] $$\theta=10^{-5} \text { and } \theta=10^{-4} \text {. }$$ This fact, together with the information provided by Figs. 11 and 12, helps to identify as a convenient trade-off a threshold value of [TeX:] $$\theta=10^{-6}$$: it keeps the number of steps required as low as possible without noticeable BER degradation. It is worth noting that, for [TeX:] $$\theta=10^{-6}$$, the MSE is slightly worse with respect to the [TeX:] $$\theta=10^{-7}$$ case: this seeming contradiction probably arises from the fact that the channel decoder is a highly nonlinear, on-off device, which only works if its input has a quality exceeding a certain threshold, and this is not completely captured by the MSE. After such threshold is reached, a further improvement in the decoder input quality (as may be obtained by passing from [TeX:] $$\theta=10^{-6}$$ to [TeX:] $$\theta=10^{-7}$$) does not improve the BER anymore.
Average number of receiver iterations for several cases with different thresholds θ for the steepest-descent gradient algorithm. For comparison, the average number of receiver iterations for the GEC-SR alternative are also shown.
Average total number of steepest-descent gradient steps required as a function of threshold θ for the same cases considered in Fig. 11, excepting the GEC-SR alternative.
In Table I, we include the average number of receiver iterations, the average total number of steepest-descent steps and the SNR required for a BER of [TeX:] $$10^{-4}$$ in the case of other system setups, where there is either non-perfect CSI or a significant amount of fading (in all cases, [TeX:] $$\theta=10^{-6}$$). We have included for comparison the figures for the [TeX:] $$K_{\text {Rice }}=100 \mathrm{~dB}$$ case with [TeX:] $$\theta=10^{-6}.$$ As it can be expected, the SNR required grows as the detection or channel conditions get worse (i.e., lower [TeX:] $$E_C / E_s$$ or lower [TeX:] $$K_{\text {Rice }}$$), reflecting what we can see in the previous figures. The number of receiver iterations lies always between 4 and 5, reflecting the fact that the genie-aided decoding stopping criterion is triggered at a similar point for a given BER, regardless of other conditions. In the case of the average total number of steepest-descent steps, we have another picture. For the channel with Rician fading and perfect CSI (cases (f) through (i)), we have practically the same values regardless of the level of fading, with a very slight increase as [TeX:] $$K_{\text {Rice }}$$ evolves towards the pure Rayleigh case.
REQUIRED [TeX:] $$E_s / N_0,$$ AVERAGE NUMBER OF RECEIVER ITERATIONS AND AVERAGE TOTAL NUMBER OF STEEPEST-DESCENT STEPS AT AROUND [TeX:] $$10^{-4}$$ BER FOR THE EM-BASED DETECTOR AND THE CASES DEPICTED IN FIGS. 9 AND 10. CASE (a): PERFECT CSI AND [TeX:] $$K_{\text {Rice }}=100$$ DB. CASES (b) − (e): NO FADING AND [TeX:] $$\frac{E_C}{E_s}=20,15,10 \text { AND } 5$$ DB, RESPECTIVELY. CASES (f) − (i): PERFECT CSI AND [TeX:] $$K_{\text {Rice }}=10,0,-10,-100$$ DB, RESPECTIVELY.
If we focus on the non-fading imperfect CSI setup results (cases (b) to (e)), we see that the trend is descending with lower values of [TeX:] $$E_C / E_s .$$ This means that the number of steps required at a given BER decreases as the CSI estimation gets worse. The reason for this probably stands in the different behavior of the objective function (13) with lower [TeX:] $$E_C / E_s$$ at the corresponding SNR. A higher amount of channel estimation noise is likely to smooth [TeX:] $$h^{(l)}(\boldsymbol{\Phi}),$$ by filling the canyons and flattening the hills, thus yielding a faster convergence of the steepest-descent algorithm.
D. Algorithm complexity and latency
The iterative receiver described in this paper has its core in the steepest-descent algorithm, used to solve the optimization problem in (9), in order to perform the phase estimation. In the previous subsection, we have shown the average number of steepest-descent steps needed to trigger a convergence condition, as a function of threshold θ, in various settings.
In each step, for every time n, (15) must be computed for each PN atomic process at the transmitter, (16) for each process at the receiver. Moreover, additional computation is related to the PN a-priori distribution (see (19)). Finally, the step size has to be optimized according to (20) (except in the first iteration, where backtracking line search is performed). In Table II, we show the number of operations that have to be performed by the proposed algorithm per steepest-descent step per time slot. To keep things simple, we have considered (20) for the step size derivation. In the same table, we show the figures for the GEC-SR phase detector, for which the number of operations per iteration per time slot is considered. In particular, we show in Table II the number of sums, products, divisions, and LUT accesses, where the latter correspond to the implementation of functions such as exponentials, logarithms, sines and cosines, and so on.
NUMBER OF OPERATIONS PER STEP/ITERATION PER TIME SLOT FOR THE STEEPEST-DESCENT PHASE DETECTOR (FIRST COLUMN) AND THE GEC-SR (SECOND COLUMN). IN THE FIRST COLUMN, (20) IS TAKEN INTO ACCOUNT FOR STEP SIZE COMPUTATION.
For the simulated case, [TeX:] $$N_{o, t}=16, N_t=32, N_{o, r}=4, N_r=64, M=64.$$ Table III shows the average number of mega-operations (millions of operations) per time slot for the steepest-descent detector and the GEC-SR with [TeX:] $$E_s / N_0=10$$ dB. For this SNR, the steepest-descent detector performs an average of 74.9 steps per receiver iteration, and an average 2.7 receiver iterations. At the same SNR, the GEC-SR, which pays a performance penalty, performs 10.4 detector iterations per receiver iteration, and an average 5.6 receiver iterations. From the table, it can be seen that the steepest-descent detector requires about twice the number of operations required by the GEC-SR, with the notable exception of divisions, which for the steepest-descent are less than 1/50 of those performed by the GEC-SR. The increase in complexity is the price to pay for the performance improvement.
AVERAGE NUMBER OF MEGA-OPERATIONS PER TIME SLOT FOR THE STEEPEST-DESCENT PHASE DETECTOR (FIRST COLUMN) AND THE GEC-SR (SECOND COLUMN).
This detector complexity must be added to that of the other receiver blocks, especially of the channel decoder, which is typically the most complex block in the receiver. While this computational burden is not irrelevant, it has to be reminded that it is performed at the BS, where the hardware is usually powerful enough to bear it.
The proposed receiver requires a certain number of iterations, thus it may seem to pose some challenges regarding latency. However, considering Fig. 11, we notice that it boils down to a noniterative receiver for a large enough SNR3. Therefore, our proposal could be conceived as a tool to reduce the operative SNR, with respect to a noniterative receiver, at the price of some latency increase. In addition, if we suppose that users have different channel SNRs, we can imagine that those of them with a larger SNR would be decoded faster than the others, thus configuring a dynamical trade-off between channel quality and experienced latency.
3 While we have used a genie-aided stopping rule to stop receiver iterations, a real stopping rule for channel decoders, e.g., LDPC decoders, performs in a similar way, just requiring a few more iterations on the average.
VI. CONCLUSIONS
In this article we have proposed an EM-based algorithm for PN estimation and correction for a massive MIMO setup, with a general number of antennas per oscillator. We have also considered Rician block fading and imperfect CSI at the receiver side. The PN estimation algorithm is able to compensate for each atomic PN process stemming from each oscillator (both at the transmitter and at the receiver side). We have also characterized the BCRB for this setup, so as to be able to compare performances in terms of the MSE attained.
The core of the algorithm comprises iterative reception, demodulation and decoding, as well as a steepest-descent optimization stage with dynamic adaptation of the step and a stopping criterion based on a threshold. We have simulated a massive MIMO system with different channel conditions and different setup parameters. The MSE results have shown that the algorithm can approach the BCRB for low values of the steepest-descent threshold, but with little or no gain in BER below a given value. The results obtained using a more realistic PN model based on a mask, instead of the worstcase Wiener PN model, show that the system is robust and consistent in this respect. The MSE and BER results obtained for the imperfect CSI case show how the CSI mismatch can degrade the performance significantly, and stresses the fact that a sufficiently good channel estimation is essential in a massive MIMO framework. Rician block fading was also been shown to hinder the performance, but with far less impact than the case of imperfect CSI. We have also compared the BER results obtained with our proposed phase detector with the results obtained resorting to another phase detector from the literature (based on the so-called GEC-SR algorithm [14]), and we have shown how the EM-based alternative outperforms the GEC-SR-based system.
On the other hand, we have evaluated the number of receiver iterations and steepest-descent algorithm steps required for given channel situations and specific configuration parameters. We have shown that it is possible to establish interesting trade-offs among MSE, BER and number of iterations/steps. In fact, by limiting the number of steepest-descent algorithm steps with an appropriate choice of the threshold, it is still possible to attain the best BER performance despite slight degradations in MSE. All this shows that the proposed receiver can be proficient in counteracting the effects of PN in massive MIMO systems, with an affordable computational burden and a trade-off between latency and received SNR, as compared with the GEC-SR alternative. In any case, as shown through the results and data shown in Subsections V-C and V-D, in a potential implementation it will be critical to pay detailed attention to the resulting latency and the resource requirements for both algorithms (the EM-based and the GEC-SR based ones), and how they scale for specially large massive MIMO setups, in order to get viable systems. Future work will be focused in expanding the study of the properties of the system, for example, providing insights on how different alternatives for the massive MIMO demodulator and the channel code may impact the PN estimation and compensation process.