Channel Prediction Based on Non-Uniform Pilot Pattern for Mobile Massive MIMO Scenarios

Yi Shi , Xianling Wang and Zhiyuan Jiang

Abstract

Abstract: Massive multiple input multiple output (MIMO) is a broadly used technique that can provide numerous gains in spectral efficiency. However, the degradation of beamforming performance due to outdated channel state information at the transmitter side (CSIT) induced by the mobility of users has been a significant problem waiting to be solved. It is reported that system performance will decrease 50 percent even in a moderate 30 km/h speed scenario. However, the CSI cannot be simply reconstructed through interpolation in high mobility scenarios due to the limitation of pilot density — the phenomenon is known as "Doppler aliasing". To address this, we propose a novel nonuniform pilot pattern that can provide more spectrum resolution compared with the uniform pilot currently used in most communication protocols. Meanwhile, we maintain the density of pilots in order not to sacrifice the payload resources. Based on the novel pilot setting, we propose two-channel prediction schemes with compressive sensing and matrix completion methods. Simulation results show our scheme can outperform deep learning-based and auto-regressive-based methods for about 15 percent in terms of average throughput in the simulated channel generated from the COST2100 channel model. To further verify the applicability, we apply our schemes in real channels measured from a channel sounding campaign, the proposed methods also achieve 5 percent gain which validates their superiority over conventional methods.

Keywords: Channel aging , channel prediction , compressive sensing , massive MIMO , matrix completion

I. INTRODUCTION

MASSIVE multiple input and multiple output (MIMO) is a developed technology that has been proven [1] to greatly promote spectral efficiency. It has attracted a lot of attention both academically and industrially and has been applied in many aspects of communication systems, including interference cancellation, multiplexing and etc. The increasing degree of freedom brought by the unprecedented number of the antenna can drastically promote the performance of beamforming. Most of the beamforming techniques [2], [3] assume perfect channel state information at the transmitter side (CSIT) to guarantee the system performance. However, as a matter of fact, the idealism of CSIT can be jeopardized by many factors, the mobility of user ends is one of them, both in TDD and FDD systems, because the channel will be outdated when the time it is used in downlink beamforming. It is reported that there exists a 50% performance degradation even in 30 km/h mobility scenario [4]. Meanwhile, with the overwhelming need for communication in moderate, even high mobility scenarios, such as Internet of vehicles and high-speed train. This problem should be urgently tackled.

Generally speaking, the channel prediction task of a pilotassisted system can be logically divided into two parts: prediction of CSI on the future pilot and non-pilot symbols. To the best of our knowledge, most of the proposed channel prediction schemes are focused on addressing the first part and leverage conventional interpolation methods to reconstruct the non-pilot symbols. These works can be generally split into two categories: methods with or without knowing channel temporal correlation function in advance.

A. Methods with Known Temporal Correlation

In the first category, the temporal correlation function is supposed to be known in advance according to the channel model concerned. Among all the considered channel models, the Jakes model is the most frequently used one. It assumes N equal-strength rays arrive at a moving receiver with uniformly distributed angles. In [5], the authors propose a channel predictor based on the Wiener filter with the temporal correlation function derived from the Jakes channel model. Similarly, ref [6], [7] analyze the performance of the Kalman filter on predicting the channel in the massive MIMO case. Ref [8] considers a scenario where the mobility of users are different, resulting in different temporal correlation function and it proposes an adaptive Kalman filter to address this problem. In addition, the influence of pilot contamination is also considered. However, knowing the temporal correlation function in advance is a strict assumption because the Jakes model differs a lot from the real-world channel which significantly degrades the performance when applied in a real channel.

B. Methods with Unknown Temporal Correlation

In the second category, the temporal correlation function is extrapolated from the observed channel instead. Ref [4] proposes an angular-delay domain prediction scheme based on the Prony method, an auto-regressive algorithm, to predict the CSI on dominant angular delay grids. It shows a good performance in terms of pilot symbol prediction both in low and high-mobility scenarios. Similarly, ref [9] utilize the burg method and modified covariance method to calculate the temporal correlation coefficient. Besides, some other works leverage super-resolution methods to extract temporal-related parameters like the Doppler frequency of each multipath component (MPC) and then reconstruct the future channel. It can predict arbitrary future CSI as long as the parameters do not change. Ref [10] proposes a two-step estimation of signal parameters via rotational invariance technique (ESPRIT) algorithm to extract the delay, Doppler shift, and complex amplitude of each MPC in an orthogonal frequency division multiplexing (OFDM) MIMO system. Ref [11] considers a doubly diverting channel and uses the EMVD method to solve the parameter mismatching problem of ESPRIT. In addition, deep learning methods have also been applied in channel prediction. Ref [12] exploits a combination of convolutional neural network (CNN) and long short term memory RNN (LSTM) to predict future CSI in a 5 minute interval. The user mobility is not considered in this work. Ref [13] also uses LSTM to predict future CSI for vehicular beamtracking, but the acquisition of the CSI on future non-pilot symbols is not mentioned.

Although the methods in the second category do not rely on the prior assumption of the channel temporal correlation, they only evaluate the prediction performance on future pilot symbols and the observed CSI is evenly spaced in the time domain which can be acquired with a uniform pilot pattern. However, if we interpolate the future non-pilot symbols and take the prediction error on them into consideration, the performance will drastically degrade in moderate to high user mobility due to Doppler aliasing caused by the inadequate density of evenly spaced pilot, which will be further elaborated in Section II.

C. Contribution of This Paper

A novel non-uniform pilot design is proposed to enable the Doppler spectrum sensing in high mobility scenarios, without sacrificing the payload resources compared with a uniform pilot case.

A model-based prediction scheme based on compressive sensing is proposed. It estimates the sparse Doppler spectrum on each angular-delay domain grid with the help of high resolution enabled by massive MIMO. To address the cyclic predicting problem caused by discrete Fourier transform (DFT) sparsifying matrix, we introduce a redundant dictionary and band-exclusion method to find dominant Doppler peaks. The necessary recovery criteria is then analyzed. Furthermore, to promote stability of recovery, we combine with multiple measurement vector (MMV) and propose Rank-Aware based orthogonal matching pursuit (OMP) with band-excluded redundant dictionary (RAMBLE) algorithm.

To further mitigate the additional estimation error introduced by parameter extraction, a model-free prediction scheme is then proposed which transfers the channel prediction task into a matrix completion problem. It can directly infer future CSI based on past observations. We first give proof that the number of MPC equals the rank of a Hankel matrix generated by the observed CSI then use iterated soft threshold (IST) method to reconstruct the future CSI in a rank minimization constraint.

The performance of the proposed algorithms is tested both in a simulated channel generated by the COST2100 channel model and a real channel measured through a channel-sounding campaign. The results show that both algorithms can achieve a good channel capacity compared with the state-ofthe-art methods.

Notation: We use the following notation throughout the paper. Bold letter X is used for matrices or vectors. Nonbold letter x is used for scalars. [TeX:] $$(\cdot)^H \text { and }(\cdot)^T$$ represent the conjugate transpose and transpose. We use [TeX:] $$\|\cdot\|_p$$ to denote p norm. [TeX:] $$|\cdot|$$ is the absolute value of its argument or the cardinality of a set. [TeX:] $$\langle\cdot\rangle$$ denotes the inner product of two vectors. [TeX:] $$\boldsymbol{X} \otimes \boldsymbol{Y}$$ is the Kronecker product of [TeX:] $$\boldsymbol{X} \text{ and } \boldsymbol{Y}$$. [TeX:] $$\operatorname{vec}(\cdot)$$ yields a vector for a matrix argument and [TeX:] $$\mathcal{C N}\left(\mu, \sigma^2\right)$$ denotes a complex circularly symmetric Gaussian random vector with mean [TeX:] $$\mu$$ and correlation [TeX:] $$\sigma^2$$. [TeX:] $$mod (\cdot) \text { and } round(\cdot)$$ are modulo and round operation, respectively.

Fig. 1.
In the uplink period, the yellow strip represents the pilot slot, the green strip represents the non-pilot slot and CSI can be measured on pilot slots and forms an observation window whose size is W, the channel predictor can predict future CSI for precoding in downlink subframe which is represented by red strips.

II. SYSTEM MODEL AND PROBLEM FORMULATION

We consider a TDD massive MIMO OFDM system, the base station, hereafter denoted by BS, is equipped with a planar antenna array which has [TeX:] $$N_{\mathrm{t}}=m n$$ antennas, where m is the number of columns and n is the number of rows. The user equipment, hereafter denoted by UE, is equipped with a linear array that has [TeX:] $$N_{\mathrm{r}}$$ antenna elements. In the uplink subframe, the UE will transmit the sounding reference signal (SRS) [14] for measuring CSI at the BS side and BS will precode the signal at the downlink subframe with the predicted channel.

Because of the intermittency of the pilot, BS could not acquire consecutive observations of CSI at the uplink subframe, We let [TeX:] $$\mathbb{M}=\left\{\boldsymbol{H}_1, \cdots, \boldsymbol{H}_{\hat{i}}, \hat{i} \leq W\right\}$$ as the measured CSI [TeX:] $$N_{\mathrm{r}}$$ set, where [TeX:] $$\boldsymbol{H}_{\hat{i}} \in \mathbb{C}^{N_{\mathrm{c}} \times N_{\mathrm{r}} \times N_{\mathrm{t}}}$$ is the measured CSI at the ith pilot symbol in a W-length observation window, [TeX:] $$N_c$$ is the number of subcarriers. In the downlink subframe, BS will use a CSI predictor, denoted as [TeX:] $$\mathbb{G}$$, to predict downlink CSI based on past observations. The frame structure and channel prediction process are depicted in Fig. 1. It is a uniform pilot design where the interval of the pilot is denoted by [TeX:] $$T_{\mathrm{srs}}$$. In the uplink period, the yellow strip represents the pilot symbol and [TeX:] $$\boldsymbol{H}_{\hat{i}}$$ can be measured on them and forms an observation window whose size is W. The green strip represents the nonpilot symbol. The channel predictor forecasts the future CSI for precoding in the downlink subframe which is represented by red strips. A straightforward prediction method is to use the CSI measured from the last pilot symbol to do beamforming. However, because of the mobility of UE, the channel will rapidly change temporally, so maintaining the last measured CSI will significantly degrade the system performance. Hence, a more dedicated channel predictor should be designed.

Define [TeX:] $$\mathbb{G}_t(\mathbb{M})$$ as the predicted channel of the tth OFDM symbol following the observation window. Channel prediction aims to design an optimal channel predictor, denoted as [TeX:] $$\hat{\mathbb{G}},$$ to minimize the time average normalized mean squared error (TNMSE) between predicted CSI and its ground-truth value, which can be formulated as follows,

(1)
[TeX:] $$\hat{\mathbb{G}}=\underset{\mathbb{G}}{\arg \min } \lim _{T \rightarrow+\infty} \frac{1}{T} \sum_{\hat{i}=1}^T \frac{\left\|\mathbb{G}_{\hat{i}}(\mathbb{M})-\boldsymbol{H}_{\hat{i}}\right\|_2}{\left\|\boldsymbol{H}_{\hat{i}}\right\|_2} .$$

A. Channel Model

The [TeX:] $$N_{\mathrm{r}} \times N_{\mathrm{t}}$$ spatial channel between the BS and UE on each time-frequency grid can be represented as

(2)
[TeX:] $$\begin{aligned} \boldsymbol{H}_{q, k}= \sum_{\hat{n} \in \Gamma} \sum_{\hat{j}=1}^P \alpha_{\hat{n}, \hat{j}} \boldsymbol{a}_{\mathrm{t}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right) \boldsymbol{a}_{\mathrm{r}}\left(\phi_{\hat{n}, \hat{j}}\right) \\ \times e^{j 2 \pi q \nu_{\hat{n}, \hat{j}} \Delta t} e^{-j 2 \pi k \tau_{\hat{n}, \hat{j}} \Delta f} . \end{aligned}$$

We denote as cluster set of the scattering environment, [TeX:] $$|\Gamma|$$ is the number of clusters, each cluster contain P subpaths. q is OFDM symbol index, k is subcarrier index. [TeX:] $$\alpha_{\hat{n}, \hat{j}}, \theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}, \phi_{\hat{n}, \hat{j}}, \nu_{\hat{n}, \hat{j}}, \tau_{\hat{n}, \hat{j}}$$ represent fading coefficient, elevation of departure (EoD), azimuth of departure (AoD), azimuth of arrival (AoA), Doppler frequency and delay of each path respectively. [TeX:] $$\Delta t \text { and } \Delta f$$ is symbol duration time and subcarrier spacing. [TeX:] $$a_{\mathrm{r}}(\cdot), a_{\mathrm{t}}(\cdot)$$ denote the steering vector at transmitter and receiver end which have the following form,

(3)
[TeX:] $$a_{\mathrm{t}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right)=\boldsymbol{V}_{\mathrm{x}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right) \otimes \boldsymbol{V}_{\mathrm{y}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right),$$

(4)
[TeX:] $$\boldsymbol{a}_{\mathrm{r}}\left(\phi_{\hat{n}, \hat{j}}\right)=\left[1, e^{j k d \sin \phi_{\hat{n}, \hat{j}}}, \cdots, e^{j k\left(N_{\mathrm{r}}-1\right) d \sin \phi_{\hat{n}, \hat{j}}}\right]^T,$$

where [TeX:] $$\boldsymbol{V}_{\mathrm{x}}(\cdot) \text { and } \boldsymbol{V}_{\mathrm{y}}(\cdot)$$ can be viewed as the steering vectors on the horizontal and vertical direction, respectively, with

(5)
[TeX:] $$\begin{aligned} \boldsymbol{v}_{\mathrm{x}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right)= {\left[1, e^{j k d_{\mathrm{x}} \cos \theta_{\hat{n}, \hat{j}} \cos \psi_{\hat{n}, \hat{j}}}\right.}, \\ \cdots, e^{\left.j k(n-1) d_{\mathrm{x}} \cos \theta_{\hat{n}, \hat{j}} \cos \psi_{\hat{n}, \hat{j}}\right]^T} \end{aligned}$$

and

(6)
[TeX:] $$\begin{aligned} \boldsymbol{v}_{\mathrm{y}}\left(\theta_{\hat{n}, \hat{j}}, \psi_{\hat{n}, \hat{j}}\right)= {\left[1, e^{j k d_y \cos \theta_{\hat{n}, \hat{j}} \sin \psi_{\hat{n}, \hat{j}}},\right.} \\ \cdots, e^{\left.j k(m-1) d_y \cos \theta_{\hat{n}, \hat{j}} \sin \psi_{\hat{n}, \hat{j}}\right]^T,} \end{aligned}$$

where [TeX:] $$k=2 \pi / \lambda \text { and } \lambda$$ is the carrier wavelength, [TeX:] $$d, d_{\mathrm{x}}, d_{\mathrm{y}}$$ is the inter-element spacing.

We should notice here that the parameters of each path in the same cluster will be slightly different, it is practically reasonable because, for a certain kind of reflector, the incident angle of all rays will not be exactly the same.

B. Doppler Aliasing

Most of the related channel prediction works focus on the prediction of future pilot symbols, and then reconstruct the non-pilot symbol with conventional interpolation methods, such as the DFT-based method. However, the conventional interpolation method will incur aliasing when the uniform pilot is not dense enough, especially in high-mobility cases. We name this kind of aliasing “Doppler aliasing” and would illustrate it in the angular-delay channel domain.

Define [TeX:] $$\boldsymbol{h}_{q, k, r:}$$ as the rth column of [TeX:] $$\boldsymbol{H}_{q, k},$$ which is the antenna pair response related to the rth receive antenna, we first construct [TeX:] $$\overline{\boldsymbol{H}}_{q, r} \in \mathbb{C}^{N_{\mathrm{t}} \times N_{\mathrm{c}}}$$ as,

(7)
[TeX:] $$\overline{\boldsymbol{H}}_{q, r}=\left[\boldsymbol{h}_{q, 1, r:}, \boldsymbol{h}_{q, 2, r:}, \cdots, \boldsymbol{h}_{q, N_{\mathrm{c}}, r:}\right],$$

where the subscript denotes the qth symbol and the rth receive antenna. The angular-delay domain channel can then be expressed as,

(8)
[TeX:] $$\hat{\boldsymbol{H}}_{q, r}=\left(\boldsymbol{D}_m \otimes \boldsymbol{D}_n\right) \overline{\boldsymbol{H}}_{q, r} \boldsymbol{D}_{N_{\mathrm{c}}}^H,$$

where [TeX:] $$D_{\hat{k}} \in \mathbb{C}^{\hat{k} \times \hat{k}}$$ is a normalized Discrete Fourier Transform matrix of [TeX:] $$\hat{k}$$th order. We denote the channel on the (i, j)th angular-delay grid as [TeX:] $$\hat{H}_{q, r}(i, j)$$,

Next we define

(9)
[TeX:] $$F_{q, r}(i, j)=\sum_{q=1}^{W-1} \hat{H}_{q, r}(i, j) e^{-j \frac{2 \pi}{W} k q}, \quad q=0,1, \cdots, W-1,$$

as the Doppler spectrum component of [TeX:] $$\hat{h}_r(i, j)=\left[\hat{H}_{1, r}(i, j), \cdots, \hat{H}_{W, r}(i, j)\right]^T \in \mathbb{C}^{W \times 1}$$ and we let

(10)
[TeX:] $$f_r(i, j)=\left[F_{1, r}(i, j), \cdots, F_{W, r}(i, j)\right],$$

as the Doppler spectrum of the rth antenna on the (i, j)th angular-delay domain grid in the observation window.

According to the Nyquist sampling theorem, for a lowpass like [TeX:] $$f_r(i, j)$$, with its upper frequency denoted as [TeX:] $$f_H(i, j)$$, if we want to reconstruct the Doppler spectrum from evenly spaced channel measurements, the minimum sampling rate should satisfy,

(11)
[TeX:] $$f_{\mathrm{s}}=\frac{1}{T_{\mathrm{srs}}}>2 f_{\mathrm{H}}(i, j).$$

If [TeX:] $$T_{\mathrm{srs}}$$ violates (11), the Doppler spectrum will be aliasing and can not be reconstructed through a lowpass filter. According to [TeX:] $$\nu_{\hat{n}, \hat{j}}=v \cos \beta_{\hat{n}, \hat{j}} / \lambda, v, \beta_{\hat{n}, \hat{j}}, \lambda$$ is the velocity, intersection angle and wavelength of each path. When UE mobility increases, the Doppler frequency of each path will increase, and the sampling rate should be correspondingly greater to satisfy the sampling theorem, it means that pilots should be denser, but in reality, time-frequency resources are limited, denser pilots also mean sacrifice of payload resources.

C. Bandpass Sinc Interpolation

To address the problem proposed above, we first consider a special case that makes the problem easier to be tackled. In high mobility scenario, the Doppler frequency of each cluster will become larger and if all the Doppler frequencies are gathered in a certain range, the Doppler spectrum can then be viewed as a bandpass signal, then bandpass interpolation can be applied to reconstruct if it satisfies the following requisition.

For a bandpass signal which upper and lower frequency are [TeX:] $$f_{\mathrm{H}} \text { and } f_{\mathrm{L}} \text {, }$$ the sampling rate [TeX:] $$f_{\mathrm{s}}$$ should satisfy

(12)
[TeX:] $$\frac{2 f_{\mathrm{H}}}{\hat{m}+1} \leq f_{\mathrm{s}} \leq \frac{2 f_{\mathrm{L}}}{\hat{m}},$$

where [TeX:] $$\hat{m}\gt 0$$ is an integer and satisfy

(13)
[TeX:] $$\hat{m} \leq \frac{f_{\mathrm{L}}}{B},$$

where B is the bandwidth of the signal. Then we can apply the following method to interpolate the CSI of non-SRS symbols.

(14)
[TeX:] $$\begin{aligned} \hat{H}_{t, r}(i, j)= \frac{2 \omega_{\mathrm{c}} \Delta T}{\pi} \sum_{\widetilde{n}=-\infty}^{+\infty} \hat{H}_{\widetilde{n} \Delta t, r}(i, j) \sin \left(\frac{\omega_{\mathrm{c}}(t-\widetilde{n} \Delta T)}{\omega_{\mathrm{c}}(t-\widetilde{n} \Delta T)}\right) \\ \times \cos \left(\omega_{\mathrm{c}}(t-\widetilde{n} \Delta T)\right), \end{aligned}$$

where [TeX:] $$\omega_{\mathrm{c}}=2 \pi f_{\mathrm{c}}, f_{\mathrm{c}}$$ is the center frequency of a bandpass signal.

Under the bandpass signal assumption, we can reconstruct the CSI without increasing the density of pilots through bandpass interpolation. However, this method will face two main problems: firstly, the bandwidth and upper/lower frequency of the Doppler spectrum are hard to estimate in practice. Secondly, it’s an ideal hypothesis that the Doppler spectrum is a bandpass signal, when there are lowpass frequency components on the Doppler spectrum that violates the assumption, bandpass interpolation will no longer work because it will still involve aliasing.

To overcome the problems mentioned above, we introduce two methods that combine the prediction on pilot and nonpilot symbols as an integral. The first one is a model-based algorithm called RAMBLE, which uses compressive sensing to extract each non-zero component on the Doppler spectrum and infers future CSI after reconstructing the channel. The second one is model-free, which leverages the matrix interpolation method to directly infer CSI based on past observations without extracting channel parameters.

III. COMPRESSIVE SENSING BASED PREDICTION SCHEME

A. Compressive Sensing (CS) and Non-Uniform Pilot Design

Let [TeX:] $$y \in \mathbb{C}^{\bar{n}}$$ denote a vector signal, CS considers a compressed measurement [TeX:] $$\boldsymbol{y}_c=\boldsymbol{\Phi} \boldsymbol{y}, \text { where } \boldsymbol{\Phi} \in \mathbb{C}^{\bar{m} \times \bar{n}}$$ is a sensing matrix, with [TeX:] $$\bar{m}\lt \bar{n},$$ therefore CS is solving an underdetermined equation, it is only applicable when y is sparse, since y is normally not sparse in its shape, its sparse representation can be obtained through a sparsifying transform , in order to reconstruct the signal in sub-Nyquist rate, the sensing matrix should be well-designed to satisfy the restricted isometry property (RIP) [15]

(15)
[TeX:] $$(1-\delta)\|\boldsymbol{y}\|_2 \leq\|\boldsymbol{\Phi} \boldsymbol{y}\|_2 \leq(1+\delta)\|\boldsymbol{y}\|_2,$$

where [TeX:] $$0 \leq \delta \leq 1$$ is the RIP parameter. Typically, a random matrix with entries independently sampled from a subgaussian distribution can satisfy this property.

In our scenario, the sensing matrix reflects the sampling pattern of pilots, the Gaussian random matrix is not available here because BS can not acquire the CSI on every OFDM symbol, as a substitution, we choose a random binary matrix denoted as S. If a symbol is chosen to insert a pilot, the corresponding entry is set to one, otherwise, it will be zero. Therefore, the pilot is no longer uniform because of the randomness.

(16)
[TeX:] $$S=\overbrace{\left[\begin{array}{cccccc} 1 0 0 0 \cdots 0 \\ 0 0 1 0 \cdots 0 \\ 0 0 0 1 \cdots 0 \\ \vdots \vdots \vdots \vdots \ddots \vdots \\ 0 0 0 0 \cdots 1 \end{array}\right]}^W\} \rho W$$

The sampling matrix is characterized by two parameters, [TeX:] $$\Delta T_{\min } \text { and } \rho \text {, }$$ where [TeX:] $$\Delta T_{\min }$$ denotes the minimum spacing of non-uniform SRS and it determines the Doppler frequency resolution of the non-uniform pilot. In order to have enough granularity to resolve the maximum Doppler frequency, [TeX:] $$\Delta T_{\min }$$ should satisfy [TeX:] $$1 / \Delta T_{\min }\gt f_{\mathrm{H}}(i, j).$$ [TeX:] $$\rho$$ denotes the density of pilots in the observation window. To guarantee fairness, we should not increase the density of the non-uniform pilot compared with the traditional scheme, so we keep the nonuniform pilot density the same as in the uniform case, without compromising the available data resources. Therefore, the total number of the non-uniform pilot in the observation window can be expressed [TeX:] $$\rho W=\left\lfloor W \Delta t / T_{\mathrm{srs}}\right\rfloor .$$ The non-uniform pilot pattern design is depicted in Fig. 2.

Fig. 2.
This figure shows the non-uniform pilot pattern design, the minimum interval of pilot is denoted by [TeX:] $$\Delta T_{\min }$$ and the density of pilot is remain the same as in uniform case.

To reconstruct the Doppler spectrum from measurements, we can formulate the optimization problem on the (i, j)th angular-delay domain grid (ADG) of each receive antenna.

(17)
[TeX:] $$\begin{array}{r} \hat{F}=\arg \min _{\boldsymbol{f}}\left\|\boldsymbol{f}_r(i, j)\right\|_0 \\ \text { s.t., } \boldsymbol{y}_r(i, j)=\boldsymbol{S} \boldsymbol{D}_W^H \boldsymbol{f}_r(i, j), \end{array}$$

where [TeX:] $$\boldsymbol{y}_r(i, j)$$ is the available observations on the selected ADG. The above problem can be solved if the Doppler spectrum of ADG is sparse enough, thanks to the sparse property of the angular delay domain, each ADG is influenced by a few clusters in which angular and delay are nearby. Hence, the number of dominant clusters in each ADG is approximately the sparsity of its Doppler spectrum. This is partially applicable if each cluster contains only one path. Each cluster has its Doppler frequency distribution because there is more than one path in each cluster, the influence of this will be further discussed in the next subsection. Problem (17) is both numerically unstable and an NP-complete problem that requires an exhaustive enumeration of all possible combinations for the locations of the nonzero entries in [TeX:] $$\boldsymbol{f}_r(i, j)$$ [16]. A classical method is minimize the L-1 norm instead, so (17) is transformed to

(18)
[TeX:] $$\begin{array}{r} \hat{F}=\arg \min _{\boldsymbol{f}}\left\|\boldsymbol{f}_r(i, j)\right\|_1 \\ \text { s.t., } \boldsymbol{y}_r(i, j)=\boldsymbol{S} \boldsymbol{D}_W^H \boldsymbol{f}_r(i, j) \text {. } \end{array}$$

There are many well-known algorithms to solve (18), such as basis pursuit [17], orthogonal basis pursuit [18] and iterative thresholding algorithm [19], we choose OMP for its simplicity. OMP is an iterative method, it finds the atom that has the highest inner product with the signal and subtracts from the signal during each iteration.

After extracting all the non-zero Doppler components from the observations, we use regularized least squared method [20] to fit the amplitude of each Doppler frequency component to mitigate the influence of processing noise during channel estimation. Then the CSI can be reconstructed with a summation of all the frequency components and the future CSI can be deduced by increasing the time index.

However, if we use an orthonormal DFT basis as the sparsifying matrix, the predicted CSI will be the duplication of the observations because of the periodic property of DFT which can be expressed as follows.

(19)
[TeX:] $$\omega_N^k=\omega_N^{k+m N},$$

where [TeX:] $$\omega=e^{-2 \pi i / N}.$$ This phenomenon is depicted in Fig. 3, the left and right figures show the amplitude and angle temporal variation of CSI on a certain ADG, the yellow frame represents the observation window and the green frame represents the prediction, which is the duplication of the observation.

In the real world, it is impossible that the actual Doppler frequency locates exactly based on the DFT, so the real channel will not be periodic. To address the problems, we resort to a Redundant dictionary to supply extra resolution capability.

Fig. 3.
The left and right figures show the variation of amplitude and phase of observed CSI and predicted CSI respectively, the yellow frame is the observation part and the red one is the predicted part. It can be seen from the graph that the prediction is the duplication of the observation because of the cyclic property of DFT.
B. Redundant Dictionary Based Channel Prediction Method

Normally, the signal will be sparse on an orthonormal basis. However, more often than not, sparsity is expressed not in terms of an orthonormal basis but in terms of an overcomplete dictionary. In this setting, the dictionary need not be orthonormal or even incoherent and often it will be overcomplete, meaning it has far more columns than rows. In our scenario, to decrease the gap between real Doppler frequency and DFT basis, we use oversampled DFT matrix as a dictionary, so [TeX:] $$\boldsymbol{y}_r(i, j)$$ can be expressed by a redundant dictionary which is the first [TeX:] $$\rho W$$ rows of the Hermitian transpose of a [TeX:] $$F W$$ order DFT matrix

(20)
[TeX:] $$\boldsymbol{y}_r(i, j)=\boldsymbol{S} \hat{\boldsymbol{D}}_{F W}^H \hat{\boldsymbol{f}}_r(i, j),$$

where [TeX:] $$\hat{D}_{FW}$$ is shown in (21), and F is the refinement factor, which reflects the frequency resolution of the dictionary. The gap between real frequency and DFT basis will be smaller when F increases. However, a large F will induce the neighboring columns of [TeX:] $$\hat{D}_{FW}$$ strongly coherent, which will cause OMP to choose two adjacent columns consecutively, it often happens because there is more than one path in a cluster [21] and the parameters of them will not be identified which will cause a small range of Doppler spread. Hence, the contribution of the selected spike could not be eliminated from the residual. There are two negative effects brought by this problem, the first one is the projection computation in OMP will become ill-conditioned [22], and the second one is not all the dominant Doppler spikes can be found because some estimated Doppler indexes will assemble on the same spike. To circumvent this, we use the band exclusion method introduced in [23]. First, define the coherent band of the dictionary as follows.

(21)
[TeX:] $$\hat{D}_{F W}=\frac{1}{\sqrt{F W}} \overbrace{\left[\begin{array}{cccccc} 1 1 1 1 \cdots 1 \\ 1 \omega \omega^2 \omega^3 \cdots \omega^{W-1} \\ \vdots \vdots \vdots \vdots \ddots \vdots \\ 1 \omega^{F W-1} \omega^{2(F W-1)} \omega^{3(F W-1)} \cdots \omega^{(W-1)(F W-1)} \end{array}\right]}^W F W$$

(22)
[TeX:] $$\begin{aligned} \left|\boldsymbol{R}_1^H \boldsymbol{\Phi}_{S_{\max }}\right|=\mid F_{\max }+\sum_{j \neq S_{\max } \cap j \in \Upsilon_{S_{\max }}} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{S_{\max }}+\cdots+\sum_{j \in \Upsilon_K} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{S_{\max }}+e^H \boldsymbol{\Phi}_{S_{\max }} \mid \\ \geq F_{\max }-F_{\max }(\delta-1) \hat{\eta}-F_{\max }(K-1) \delta \eta-\|e\|_2 \end{aligned}$$

Let [TeX:] $$\eta\gt 0$$. Define the -coherence band of column k to be the set

(23)
[TeX:] $$B_\eta(\bar{k})=\{\bar{i} \mid \mu(\bar{i}, \bar{k})>\eta\},$$

where [TeX:] $$\mu(\bar{i}, \bar{k})=\frac{\left|\Phi_{\bar{i}}^H \Phi_{\bar{k}}\right|}{\left\|\Phi_{\bar{i}}\right\|_2\left\|\Phi_{\bar{k}}\right\|_2}$$ the -coherence band of the index set [TeX:] $$\mathbb{S}$$ is

(24)
[TeX:] $$B_\eta(\mathbb{S})=\cup_{\bar{k} \in \mathbb{S}} B_\eta(\bar{k}),$$

and denote [TeX:] $$B_\eta^{(2)}(\bar{k})=B_\eta\left(B_\eta(\bar{k})\right).$$ Then we can modify the matching step in OMP as follows:

(25)
[TeX:] $$\bar{i}_{\max }=\underset{\bar{i}}{\arg \min }\left|\left\langle\Phi_{\bar{i}}^H, \boldsymbol{R}\right\rangle\right|, \bar{i} \notin B_\eta^{(2)}(\boldsymbol{\Omega}),$$

where [TeX:] $$\Omega$$ is the estimated index set. Equation (25) skips the neighboring zone of each estimated Doppler component under the assumption that the Doppler frequencies of every two clusters should be separated at least [TeX:] $$B_\eta^{(2)}(\bar{k})$$ apart. This is acceptable because clusters are normally spatially uncorrelated. Besides, we uniformly add a few adjacent frequencies near each estimated non-zero Doppler to simulate the Doppler frequency distribution in a single cluster after iteration reaches its end. The algorithm is illustrated in Algorithm 1. The modifications compared with traditional OMP are highlighted with boxes.

Redundant dictionary OMP-based angular delay domain channel prediction

Next, we will analyze the recovery stability of Algorithm 1, first define [TeX:] $$\Upsilon_{\bar{i}}$$ as the Doppler index set spanned by the [TeX:] $$\bar{i}$$th Doppler spike in one ADG, which can be expressed as [TeX:] $$\Upsilon_{\bar{i}}=\left\{S_{\bar{i}}-\delta / 2, \cdots, S_{\bar{i}}+\delta / 2\right\},$$ where [TeX:] $$S_{\bar{i}}$$ is the center Doppler frequency and [TeX:] $$\delta$$ is the Doppler index spread. In a practical case, the non-Doppler term in [TeX:] $$f_r$$ will not be strictly zero, so the range of a Doppler spike can be defined by a manually set threshold. Next, define [TeX:] $$\Pi\left[\hat{f}_r(i, j)\right]=\left\{S_1, \cdots, S_K\right\}$$ where K is the number of spikes. An ideal recovery is defined as each estimated Doppler index being within the -coherence band of each element in [TeX:] $$\Pi\left[\hat{f}_r(i, j)\right]$$.

Theorem 1 (stable recovery criteria for band-excluded OMP):

Let [TeX:] $$\hat{f}_r(\hat{m}, \hat{n})$$ has K non-zero spikes and the length of each spike is [TeX:] $$\delta$$. Let [TeX:] $$\eta\gt 0$$ and [TeX:] $$\hat{\eta}=\max \mu(i, j) \quad \forall i, j \in \left[S_{\bar{k}-\delta / 2}, \cdots, S_{\bar{k}+\delta / 2}\right].$$ Suppose that

(26)
[TeX:] $$B_\eta(a) \cap B_\eta^{(2)}(b)=\emptyset \quad \forall a, b \in \Pi\left[\hat{f}_r(\hat{m}, \hat{n})\right],$$

if

(27)
[TeX:] $$(2 K-1) \delta \eta+(\delta-1) \hat{\eta}+2 \frac{\|e\|_2}{F_{\max }}\lt 1,$$

and

(28)
[TeX:] $$\frac{F_{\max }(\eta \delta K-1)+2 \eta(K-1) A_{\max }+2\|e\|_2}{F_{\min }[1-\hat{\eta}(\delta-1)-\eta \delta K]} \leq 1,$$

are satisfied where

(29)
[TeX:] $$\begin{aligned} A_{\max }= \frac{1}{1-\eta(K-2)}\left[F_{\max }+F_{\max }(\delta-1) \hat{\eta}\right. \\ \left.+F_{\max }(K-1) \delta \eta+\|e\|_2\right], \end{aligned}$$

and [TeX:] $$F_{\max }=\max \hat{\boldsymbol{f}}_r(i, j) \text { and } F_{\min }=\min \hat{\boldsymbol{f}}_r(i, j),$$ then the reconstructed Doppler index will fall in [TeX:] $$B_\eta\left(\Pi\left[\hat{\boldsymbol{f}}_r(i, j)\right]\right)$$.

Proof 1: Firstly, we will prove that the first chosen index would fall in [TeX:] $$\Pi\left[\left(\hat{\boldsymbol{f}}_r(i, j)\right]\right.$$, Let [TeX:] $$S_{\max }$$ be the index of the largest component in the absolute value of [TeX:] $$\hat{\boldsymbol{f}}_r(i, j)$$. In the first iteration, [TeX:] $$\boldsymbol{R}_1=\sum_{i=1}^K \sum_{j \in \Upsilon_1} F_j \boldsymbol{\Phi}_j,$$ the subscript of R denotes the iteration index and the subscript of denotes the column index. In addition, we ignore the superscript of F here for simplicity, as shown in (22).

For [TeX:] $$\forall l \notin B_\eta\left(\Pi\left[\hat{\boldsymbol{f}}_r(i, j)\right]\right)$$

(30)
[TeX:] $$\begin{aligned} \left|\boldsymbol{R}_1^H \boldsymbol{\Phi}_l\right| =\left|\sum_{j \in \Upsilon_1} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_l+\cdots+\sum_{j \in \Upsilon_K} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_l+e^H \boldsymbol{\Phi}_l\right| \\ \leq F_{\max } K \delta \eta+\|\boldsymbol{e}\|_2, \end{aligned}$$

if the right-hand side of (22) is greater than the right-hand side of (30), we get

(31)
[TeX:] $$(2 K-1) \delta \eta+(\delta-1) \hat{\eta}+2 \frac{\|\boldsymbol{e}\|_2}{F_{\max }}\lt 1$$

Then the first chosen index will be [TeX:] $$S_{\max }$$. Next we suppose the first k − 1 selected indices are in [TeX:] $$B_\eta\left(S_i\right), S_i \in \Pi\left(\boldsymbol{f}_r(i, j)\right),$$ the k − 1th residual can be expressed as

(32)
[TeX:] $$\boldsymbol{R}_{k-1}=\hat{\boldsymbol{f}}_r(i, j)-A_{I_1} \boldsymbol{\Phi}_{I_1}-A_{I_2} \boldsymbol{\Phi}_{I_2}-\cdots-A_{I_{k-1}} \boldsymbol{\Phi}_{I_{k-1}},$$

where [TeX:] $$\left\{A_{I_1}, A_{I_2}, \cdots, A_{I_{k-1}}\right\}$$ is the estimated coefficients, we can derive each coefficient as

(33)
[TeX:] $$\begin{aligned} A_{I_i}= \sum_{j \in \Upsilon_1} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{I_i}+\sum_{j \in \Upsilon_2} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{I_i}+\cdots \\ +\sum_{j \in \Upsilon_K} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{I_i}+e^H \boldsymbol{\Phi}_{I_i} \\ -\sum_{n \neq i} A_{I_n} \boldsymbol{\Phi}_{I_n}^H \boldsymbol{\Phi}_{I_i}, i=1, \cdots, k-1, \end{aligned}$$

which implies

(34)
[TeX:] $$\begin{aligned} \left|A_{I_i}\right| \leq F_{\max }+F_{\max }(\delta-1) \hat{\eta}+F_{\max }(K-1) \delta \eta+\|e\|_2 \\ +\eta \sum_{n \neq i}\left|A_{I_n}\right|, i=1, \cdots, k-1 \end{aligned}$$

Let [TeX:] $$A_{\max }=\max _{j=1, \cdots, k-1}\left|A_{I_j}\right|,$$ (34) can then be

(35)
[TeX:] $$\begin{aligned} A_{\max } \leq F_{\max }+F_{\max }(\delta-1) \hat{\eta}+F_{\max }(K-1) \delta \eta+\|\boldsymbol{e}\|_2 \\ +\eta(k-2) A_{\max }. \end{aligned}$$

hence,

(36)
[TeX:] $$\begin{aligned} A_{\max } \leq \frac{1}{1-\eta(k-2)}\left[F_{\max }+F_{\max }(\delta-1) \hat{\eta}\right. \\ \left.+F_{\max }(K-1) \delta \eta+\|\boldsymbol{e}\|_2\right], \end{aligned}$$

when k = K, the right-hand side of (36) will get its maximum value.

Now we show that the kth selected index will fall in [TeX:] $$B_\eta\left(\Pi\left(\hat{\boldsymbol{f}}_r(i, j)\right)\right).$$ For the kth residual, we have (37). For [TeX:] $$\forall l \notin B_\eta^{(2)}\left(S_1, \cdots, S_{k-1}\right) \cup B_\eta\left(S_k, \cdots, S_K\right)$$

(39)
[TeX:] $$\begin{aligned} \mathbf{R}_{k-1}^H \boldsymbol{\Phi}= \mid \sum_{j \in \Upsilon_1} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_l+\cdots+\sum_{j \in \Upsilon_K} F_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_l+\mathbf{e}^H \Phi_l \\ -A_{I_1} \boldsymbol{\Phi}_{I_1}^H \boldsymbol{\Phi}_l-\cdots-A_{I_{k-1}} \boldsymbol{\Phi}_{I_{k-1}}^H \boldsymbol{\Phi}_l \mid \\ \leq \eta F_{\min } K \delta+\|\mathbf{e}\|_2+\eta(k-1) A_{\max } \end{aligned}$$

If the right-hand side of (37) is gerater than the right-hand side of (39), we get

(40)
[TeX:] $$\frac{F_{\max }(\eta \delta K-1)+2 \eta(K-1) A_{\max }+2\|e\|_2}{F_{\min }[1-\hat{\eta}(\delta-1)-\eta K \delta]} \leq 1,$$

which end our proof.

Remark 1: From the above stable recovery criteria, there exists a compromise among [TeX:] $$\eta, \delta, K, F_{\max } / F_{\min }.$$ If we wants to recover the Doppler spectrum with more spikes with a fixed refinement factor, should be smaller which is equivalent to more time domain samples. The recovery probability will decrease if there are more distinct Doppler components in one spike or the dynamic range [TeX:] $$F_{\max } / F_{\min }.$$ is large. In addition, (26) is not a strict requirement because clusters will normally be separately located.

(37)
[TeX:] $$\begin{aligned} \left|\boldsymbol{R}_{k-1}^H \boldsymbol{\Phi}_{S_k}\right|= \left|\sum_{j \in \Upsilon_1} \mathbb{F}_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{S_k}+\cdots+\sum_{j \in \Upsilon_K} \mathbb{F}_j \boldsymbol{\Phi}_j^H \boldsymbol{\Phi}_{S_k}+e^* \boldsymbol{\Phi}_{S_k}-A_{I_1} \boldsymbol{\Phi}_{I_1}^H \boldsymbol{\Phi}_{S_k}-\cdots-A_{I_{k-1}} \Phi_{I_{k-1}}^H \Phi_{S_k}\right| \\ \geq F_{\min }-\hat{\eta}(\delta-1) F_{\min }-\eta(K-1) \delta F_{\max }-\|e\|_2-\eta(k-1) A_{\max } \end{aligned}$$

(38)
[TeX:] $$\begin{aligned} {\left[\operatorname{vec}\left(\hat{\boldsymbol{H}}_{q, r}+n\right)\right]_i } =\left[\left(\boldsymbol{D}_{N_{\mathrm{c}}} \otimes \boldsymbol{D}_m \otimes \boldsymbol{D}_n\right) \operatorname{vec}\left(\overline{\boldsymbol{H}}_{q, r}+n\right)\right]_i \\ =\sum_{n \in \Gamma} \sum_{j=1}^P \alpha_{k_1, k_2, k_3} m n N_{\mathrm{c}} \operatorname{Sa}\left(2 \pi\left(\hat{\theta}_{k_1}-\theta_{n, j}\right) \frac{m}{2}\right) \operatorname{Sa}\left(2 \pi\left(\hat{\psi}_{k_2}-\psi_{n, j}\right) \frac{n}{2}\right) \operatorname{Sa}\left(2 \pi\left(\hat{\tau}_{k_3}-\tau_{n, j}\right) \frac{N_{\mathrm{c}}}{2}\right)+\frac{\sigma^2}{m n N_{\mathrm{c}}} \end{aligned}$$

C. Multiple Measurement Vector

To further improve the recovery stability in the presence of noise and the increasing number of non-zero Doppler spikes, MMV method is leveraged to further excavate the potential of multiple antenna at the receiver end.

First define the support of a collection of vectors [TeX:] $$X=\left[x_1, \cdots, x_l\right]$$ as the union over all individual supports,

(41)
[TeX:] $$\operatorname{supp}(\boldsymbol{X})=\bigcup_i \operatorname{supp}\left(\boldsymbol{x}_i\right)$$

In other words, a joint sparse matrix defined above contains several sparse vectors where the non-zero component in each of them is in the same position. Given the definition of X, a noiseless MMV problem can be defined as follows,

Given [TeX:] $$\boldsymbol{Y} \in \mathbb{C}^{m \times l} \text { and } \hat{\Phi} \in \mathbb{C}^{m \times n} \text { with } m\lt n$$

(42)
[TeX:] $$\hat{\boldsymbol{X}}=\underset{\boldsymbol{X}}{\arg \min }|\operatorname{supp}(\boldsymbol{X})| \quad \text { s.t. } \boldsymbol{Y}=\hat{\Phi} \boldsymbol{X} \text {. }$$

It is a special case when l = 1 then X shrinks into a vector form which is named by single measurement vector (SMV) problem. It can be solved previous reconstruction method because algorithm 1 treats each receive antenna as an individual.

For the SMV problem, it is well known that a necessary and sufficient condition for the measurements [TeX:] $$y=\Phi x$$ to uniquely determine each k-sparse vector x is given by

(43)
[TeX:] $$k\lt \frac{\operatorname{spark}(\Phi)}{2},$$

where spark of [TeX:] $$\Phi$$ is defined as the smallest number of columns of [TeX:] $$\Phi$$ that are linearly independent.

Similarly, a sufficient condition from the measurements [TeX:] $$Y=\hat{\Phi} X$$, to uniquely determine the jointly sparse matrix X has been proposed in [24],

(44)
[TeX:] $$|\operatorname{supp}(\boldsymbol{X})|\lt \frac{\operatorname{spark}(\Phi)-1+\operatorname{rank}(\boldsymbol{Y})}{2},$$

which implies that the measurements can be reduced if the rank of Y is greater than 1.

In our scenario, the receive antenna can be considered as multiple measurements, We can thus combine [TeX:] $$\hat{f}_r(i, j), r=1, \cdots, N_{\mathrm{r}}$$ as a matrix [TeX:] $$\boldsymbol{F}(i, j)$$

(45)
[TeX:] $$\boldsymbol{F}(i, j)=\left[\hat{\boldsymbol{f}}_1(i, j), \hat{\boldsymbol{f}}_2(i, j), \cdots, \hat{\boldsymbol{f}}_{N_{\mathrm{r}}}(i, j)\right].$$

To implement the MMV algorithm on F(i, j), one should first guarantee that F(i, j) satisfies the joint sparse property.

Theorem 2 (asymptotic joint sparse property):

For [TeX:] $$\forall i, j=1, \cdots, N_{\mathrm{r}}, i \neq j$$

(46)
[TeX:] $$\lim _{N_{\mathrm{c}}, N_t \rightarrow+\infty} \operatorname{supp}\left(\hat{f}_i(k, q)\right)=\operatorname{supp}\left(\hat{f}_j(k, q)\right) .$$

Proof 2:

Without loss of generality, consider an additive noise induced by channel measurement, denoted by [TeX:] $$n \sim \mathcal{C N}(0,1)$$ For the rth receive antenna, the observed channel in each AGD can then be expressed as follows, [TeX:] $$[\cdot]_i$$ here denotes the ith entry of a vector as shown in (38):

where [TeX:] $$k_1=mod (i, n), k_2=\operatorname{round}(i / n), k_3=$$ are equivalent estimation value on each grid. When [TeX:] $$m, n, N_{\mathrm{c}}$$ approximate to infinity, the gap between estimation value and real value becomes smaller due to the higher resolution, then [TeX:] $$S a(\cdot)$$ will change into Dirac function, which implies that the Doppler spectrum of each receiving antenna on an ADG will contain the same single path and will not be influenced by the power of other ADGs, which end our proof.

Remark 2: The above definition implies that when the number of antenna and subcarrier goes to infinity, the observation will be a rank-deficient matrix, which causes the MMV to compromise its ability to decrease the number of measurements but to mitigate the influence of channel noise. To implement MMV into our scheme, we adopt the rank-aware method proposed in [25], it decomposes the measurement matrix to orthogonal basis and computes the projection power of each column on it, we still adopt band exclusion strategy while selecting the columns. The details of the algorithm are illustrated in Algorithm 2. We name the rank-aware based omp with band-excluded redundant dictionary as RAMBLE algorithm.

Rank-aware based OMP channel prediction with band-excluded redundant dictionary

IV. MATRIX COMPLETION PREDICTION SCHEME

The previously proposed algorithm relies on the channel modeling of the real-world channel, it estimates the Doppler parameters from the observed channel and reconstructs the future channel based on known channel function. In this section, a model-free channel prediction scheme is proposed which does not rely on the modeling of channel. It directly deduces the future channel solely based on past observations.

The problem is still considered in the angular-delay domain, for the (i, j)th grid of the rth receive antenna, construct a Hankel matrix with the past observations as follows, the subscript of receive antenna is omitted for the simplicity of expression:

(47)
[TeX:] $$\boldsymbol{B}=\left[\begin{array}{cccc} \hat{H}_{1, r}(i, j) \hat{H}_{2, r}(i, j) \cdots \hat{H}_{\frac{W+1}{2}, r}(i, j) \\ \hat{H}_{2, r}(i, j) \hat{H}_{3, r}(i, j) \cdots \hat{H}_{\frac{W+3}{2}, r}(i, j) \\ \vdots \vdots \ddots \vdots \\ \hat{H}_{\frac{W+1}{2}, r}(i, j) \hat{H}_{\frac{W+3}{2}, r}(i, j) \cdots \hat{H}_{W, r}(i, j) \end{array}\right] .$$

As one can not get all the CSI in the observation window, the actual B will be a sparse matrix with many entries left zero. Considering the predicted CSI, B will be padded with a lower triangle matrix of zero which represents the future CSI, denote the padded matrix as [TeX:] $$\hat{B}$$

(48)
[TeX:] $$\hat{B}=\left[\begin{array}{cccc} \hat{H}_{1, r}(i, j) 0 \cdots \hat{H}_{\frac{W+N+1}{2}, r}(i, j) \\ 0 \hat{H}_{3, r}(i, j) \cdots 0 \\ \vdots \vdots \ddots \vdots \\ \hat{H}_{\frac{W+N+1}{2}, r}(i, j) 0 \cdots 0_{N \times N} \end{array}\right],$$

where [TeX:] $$0_{N \times N}$$ is a lower triangle matrix of zero.

Therefore, the prediction of future CSI can now be transformed to complete [TeX:] $$\hat{B}$$ of those zeros entries. Meanwhile, a criteria is needed to help supervise the reconstruction process. First, we will prove that if [TeX:] $$\hat{H}_{q, r}(i, j)$$ can be expressed as a sum of exponential, the rank of [TeX:] $$\hat{B}$$ constructed from [TeX:] $$\hat{H}_{q, r}(i, j)$$ is equal to the number of components in [TeX:] $$\hat{H}_{q, r}(i, j)$$.

Proof 3: Suppose a signal is the sum of exponential which can be expressed as

(49)
[TeX:] $$S_n=\sum_{k=1}^K a_k e^{j 2 \pi f_k n} \quad n=0, \cdots, L,$$

where L is the total length of the signal, then one can construct a Hankel matrix from [TeX:] $$S_n$$

(50)
[TeX:] $$M_s=\left[\begin{array}{cccc} S_0 S_1 \cdots S_{N-1} \\ S_1 S_2 \cdots S_N \\ \vdots \vdots \ddots \vdots \\ S_M S_{M+1} \cdots S_{M+N-1} \end{array}\right] \text {, }$$

where both M and N are integer and satisfy L = M+N −1, M and N should be equal to guarantee [TeX:] $$M_s$$ be a square matrix. According to [26], [TeX:] $$M_s$$ can then be decomposed as

(51)
[TeX:] $$M_s=\Omega_M A \Omega_N^H.$$

A is a diagonal matrix with its entries be the amplitude of each exponential.

(52)
[TeX:] $$A=diag\left(a_1, a_2, \cdots, a_K\right)$$

[TeX:] $$\Omega_x \text { is a } x \times K$$ Vandermond matrix defined as follows.

(53)
[TeX:] $$\boldsymbol{\Omega}_{\boldsymbol{x}}=\left[\begin{array}{cccc} 1 1 \cdots 1 \\ e^{j f_1} e^{j f_2} \cdots e^{j f_K} \\ \vdots \vdots \ddots \vdots \\ e^{j f_1(x-1)} e^{j f_2(x-1)} \cdots e^{j f_K(x-1)} \end{array}\right].$$

Equation (51) implies that for [TeX:] $$\forall i, j=1, \cdots, K, \text { and } i \neq j$$, if [TeX:] $$f_i \neq f_j,$$ then the rank of M is equal to the number of exponential components in [TeX:] $$S_n$$. based on our channel model, the entries in [TeX:] $$\hat{B}$$ satisfies (49), therefore, the rank of [TeX:] $$\hat{B}$$ is approximately equal to the number of Doppler spike in the corresponding ADG. To satisfy the sparse property of the scattering environment, the aim of reconstruction is to use the least number of the component to fit observations, which is equivalent to minimizing the rank of [TeX:] $$\hat{B}$$.

One can now reconstruct the matrix with a rank minimization constraint, the optimization problem can be formulated as follows.

(54)
[TeX:] $$\begin{aligned} \bar{B} =\arg \min \operatorname{rank}(\hat{B}) \\ \text { s.t., } \bar{B}_{i, j} =\hat{B}_{i, j} \quad \hat{B}_{i, j} \neq 0, \end{aligned}$$

where the subscript represents the non-zero entry in the ith column and the jth row, denotes the set of non-zero entry index tuple as [TeX:] $$\kappa$$. However, (54) is an NP-hard problem, an alternative way is to minimize the nuclear norm of [TeX:] $$\hat{B}$$. Then (54) becomes

(55)
[TeX:] $$\begin{array}{r} \bar{B}=\arg \min \|\hat{B}\|_* \\ \text { s.t., } \bar{B}_{i, j}=\hat{B}_{i, j} \quad \hat{B}_{i, j} \neq 0, \end{array}$$

where [TeX:] $$\|\hat{B}\|_*$$ is the sum of singular values of [TeX:] $$\hat{B}$$.

To solve the above problem, soft iterative thresholding (SIT) proposed in [27] is leveraged. It is an iterative algorithm, in the kth iteration, it decomposes [TeX:] $$\hat{B}$$ through singular value decomposition (SVD) as [TeX:] $$\hat{B}=U \Sigma V^H,$$ where U and V are unitary matrix and [TeX:] $$\Sigma$$ is a positive diagonal matrix. Then for each [TeX:] $$\tau\gt 0$$, define singular value shrinkage operator [TeX:] $$\mathbb{D}_\tau$$ as

(56)
[TeX:] $$\mathbb{D}_\tau(\boldsymbol{\Sigma})=\operatorname{diag}\left(\left\{\sigma_i-\tau\right\}_{+}\right),$$

where [TeX:] $$t_{+}$$ is the positive part of t, namely [TeX:] $$t_{+}=\max (0, t)$$, this operator effectively shrinking singular values to zero. Then, if we apply this operator to [TeX:] $$\hat{B}$$, we get

(57)
[TeX:] $$\mathbb{D}_\tau(\hat{\boldsymbol{B}})=\boldsymbol{U} \mathbb{D}_\tau(\boldsymbol{\Sigma}) \boldsymbol{V}^*.$$

To maintain the non-zero entries equal to the initial value in each iteration, it takes following step for a fixed [TeX:] $$\tau\gt 0$$ and a sequence [TeX:] $$\left\{\delta_k\right\}$$ of positive step sizes.

(58)
[TeX:] $$\tilde{\boldsymbol{B}}^{k-1}=\mathbb{D}_\tau\left(\hat{\boldsymbol{B}}^{k-1}\right),$$

(59)
[TeX:] $$\hat{\boldsymbol{B}}^k=\hat{\boldsymbol{B}}^{k-1}+\delta_k\left(\hat{\boldsymbol{B}}^0-\tilde{\boldsymbol{B}}_{\boldsymbol{\kappa}}^{k-1}\right),$$

where [TeX:] $$\tilde{\boldsymbol{B}}_{\boldsymbol{\kappa}}^{k-1}$$ is a matrix with its non-zero entry index the same as [TeX:] $$\hat{B}^0$$ and its non-zero entry value the same as [TeX:] $$\tilde{\boldsymbol{B}}^{k-1}$$.

Because the neighboring entry in the Hankel matrix is equally spaced depending on the minimum interval of the non-uniform pilot, if the minimum interval exceeds the interval between two OFDM symbols, after iteration reaches its maximum value, we leverage the lowpass DFT interpolation method mentioned before to reconstruct all the future CSI based on the predicted output of MC. The details are listed in Algorithm 3.

Matrix completion based channel prediction method

V. SIMULATION RESULTS

A symbol level simulation is conducted where BS will operate beamforming to the transmitted signal according to the predicted channel. Eigen-based method [28] is used as the precoding method and MMSE is used at the user end as the equalization method.

The proposed algorithms are applied in both simulated channels and real channels acquired from channel sounding. Firstly, the COST2100 channel model [21] is used to generate a simulated channel with its setting listed in Table I.We choose semi-urban-300M as our channel scenario and set the carrier frequency and bandwidth to 3.5 GHz and 20 MHz respectively which is a default setting in the long term evolution (LTE) system. The scattering environment consists of 25 clusters and each of them contains 5 MPC. BS is equipped with a 4 × 8 planar array and each antenna is unipolar and omni-directional. Both single and multi-user cases are considered and we set the number of user antenna as 4 and 2 in single and multi-user cases, respectively. Two velocities, 60 km/h and 120 km/h, are simulated covering moderate and high-speed scenarios.

TABLE I
COST2100 SETTINGS.

The relationship between channel prediction error versus user mobility is first shown in uniform pilot and non-uniform pilot cases with the proposed RAMBLE method. It predicts the CSI on the adjacent future symbol after the observation window, then the window will slide forward which represents the acquirement of newly updated CSI from channel estimation and the prediction will be operated again. The window size is set to 400 and the interval between symbols is 0.5 ms. In addition, the pilot symbol interval in the uniform pilot case is set to 5 ms which is a default SRS interval in the LTE protocol. For non-uniform pilot design, two strategies are tested: non-uniform pilot which meets the minimum interval criteria, and random pilot which ignores the previous criteria so the minimum spacing will be arbitrary. Both non-uniform patterns guarantee the same pilot density as in the uniform case so the density will equal 0.025. It can be seen from Table II that non-uniform pilot can outperform uniform pilot in both 60 km/h and 120 km/h velocity, which implies that nonuniform pilot can provide more resolution to distinguish higher Doppler frequency. Meanwhile, it shows a better performance compared with the random pilot because the latter provides more resolution than it needs which will cause the pilot distribution in the observation window to become imbalanced.

TABLE II
COMPARISON OF TNMSE BETWEEN UNIFORM AND NON-UNIFORM PILOTS AT DIFFERENT VELOCITIES.

Secondly, the advantage of MMV over SMV is depicted in Fig. 4 which shows TNMSE versus channel measurement noise in 60 km/h and 120 km/h user velocity. One can observe from Fig. 4 that MMV can outperform SMV in low and high SNR regimes, which implies that it can not only combat channel measurement noise but also offer more extra measurements because SMV deems each receive antenna as an individual and MMV leverages the extra information brought by multiple measurements.

Fig. 4.
TNMSE vs. channel measurement noise level in 60 km/h and 120 km/h user speed.

Next, to manifest the superiority of our algorithms, we take the algorithm proposed in [4] and bayesian regularized neural network (BRNN) [29] as our baselines, both of them use uniform pilot patterns. The former uses the vector Prony method to predict the channel on the future pilot symbol in the angular-delay domain, the order of the Prony method is set to 3. The BRNN is a neural network using Bayesian regularization, the complex CSI is separated into real and imaginary parts and stacked together as the network input. The number of hidden layer neurons is set to 50 and the tangent sigmoid is used as an activation function. We then use the lowpass/bandpass DFT interpolation method to predict the CSI on non-pilot symbols based on the profile of the Doppler spectrum. We assume here that the profile, center frequency, and bandwidth of the Doppler spectrum can be served as prior although they are not acquirable in practice. The window size of both algorithms is set the same as RAMBLE and MC as 400. The average throughput loss in single user case is depicted in Fig. 5, the performance of the Prony+interpolation method and BRNN+interpolation method is quite limited due to the aliasing of the Doppler spectrum. With the help of increased resolution brought by the non-uniform pilot, RAMBLE and MC achieve a good performance both in low and high SNR regimes.

Fig. 5.
Average throughput loss vs. SNR, noise-free channel measurements, [TeX:] $$N_t=32$$.

In addition, the non-ideal characteristic of the actual system is also considered which has a channel estimation delay, meaning that RAMBLE and MC can not get the instantaneous channel measurement when the observation window is sliding forward. Therefore, the updated CSI is excluded from the observation window and the result is shown in Fig. 6. The performance of both proposed algorithms can still outperform the traditional methods to a great extent.

Fig. 6.
Average throughput loss vs. SNR, noise-free channel measurements with channel processing delay, [TeX:] $$N_t=32$$.

The curve of average throughput loss versus SNR in the multiuser case is depicted in Fig. 8, and the number of users is set to 4. It can be shown from the graph that the proposed algorithms can reduce the average throughput loss by about 10 percent and MC shows a better performance compared with RAMBLE.

To further showcase the proposed algorithms can be applied to the real measured channel. A channel-sounding campaign is conducted on campus. A 3.5 GHz channel sounder is adopted and it is equipped with a 3 × 8 bipolar antenna panel at the transmitter end and a two-element array at the receiver end. The channel is measured fixedly with 3 m interval and the route is depicted in Fig. 7 where the red star is the location of the transmitter and the red line represents the channel sounding route. To generate mobility from stationary channel data, we uniformly extract points from the route to simulate a mobile scenario, the equivalent speed will get greater if the spacing between the extracted points increases. The simulation result is shown in Fig. 9, both MC and RAMBLE outperform the Prony method for about 5 percent, which verify the applicability of the proposed algorithm. In addition, MC performs better than RAMBLE in real channel cases, which benefits from its direct usage of channel observations, avoiding incurring extra noise while estimating the intermediate parameters based on the channel model.

Fig. 7.
Channel sounding route and transmitter location.
Fig. 8.
Average throughput loss vs. SNR in different velocities, noise-free channel measurements, 4 UEs, [TeX:] $$N_t=32$$.
Fig. 9.
Average throughput loss versus SNR in real measured channel case.

VI. CONCLUSION AND FUTURE WORK

In this work, the existing problem of current channel prediction schemes is first pointed out, which is the Doppler aliasing problem in moderate to high mobility scenarios because of the density limitation of the uniform pilot pattern. Hence the reconstruction performance of non-pilot symbols will drastically degrade. To solve this, a non-uniform pilot pattern is proposed that differs from the uniform pilot pattern used in most communication protocols. Then two channel prediction methods are proposed, the first one uses compressive sensing to reconstruct the Doppler spectrum of each angular-delay domain grid. To solve the repetition problem induced by the orthonormal DFT transformation matrix, redundant dictionary, and band exclusion are introduced to mitigate the basis mismatching problem. To further combat the channel-measuring noise and increase recovery stability, multiple measurements from different receiving antennas are leveraged to promote the performance. To bridge the gap between the real channel and channel model, the second method is data-driven which considers the channel prediction task as a matrix completion problem constrained by the rank of the Hankel matrix constructed from measurements.

Simulation results show that the proposed methods can outperform the state-ofthe-art algorithms both in the simulated channel and real measured channel in terms of throughput. In the simulated multiuser channel setting, the proposed methods can reduce the average channel throughput loss by about 15 percent both in 60 and 120 km/h user speed. The applicability of the proposed algorithms, in reality, is also verified with a measured channel where the proposed algorithm can get about 5 percent channel throughput-loss gain compared with conventional algorithms.

In this paper, there is a latent assumption that the support of the Doppler spectrum in the observation window is timeinvariant, which implies that the Doppler frequency of each cluster should not change greatly in a certain period of time. Therefore, a smaller window size is chosen when the mobility becomes larger, which will lessen the available channel measurements and degrade the recovery performance. However, if one extends the “Doppler stationary” to a broader sense, which means the variation of Doppler support is a fixed function of time, this kind of long-term stationary could enlarge the window size and further promote the performance. Secondly, the discussion of the non-uniform pilot pattern in this paper only considers the minimum spacing, the optimum pattern of the non-uniform pilot is still an open problem [30].

Biography

Yi Shi

Yi Shi [S’21] (yishi1996@shu.edu.cn) received the B.E. degree from the School of Communication and Information Engineering, Shanghai University, in 2019, where he is currently pursuing the Ph.D. degree with the School of Communication and Information Engineering. His main research interests include channel prediction and application specific integrated circuit.

Biography

Xianling Wang

Xianling Wang [S’22] (lance_wang@shu.edu.cn) received the B.E. degree from the School of Communication and Information Engineering, Shanghai University, in 2020, where he is currently pursuing the Ph.D. degree with the School of Communication and Information Engineering. His main research interests include channel measurement, estimation and prediction. Zhiyuan Jang [S’12-M’15] (jiangzhiyuan@shu.edu.cn) received the B.S. and Ph.D. degrees from the Electronic Engineering Department, Tsinghua University, China, in 2010 and 2015, respectively. He is currently a Professor at the School of Communication and Information Engineering, Shanghai University, Shanghai, China. He visited the WiDeS Group, University of Southern California, Los Angeles, CA, USA, from 2013 to 2014. He worked as an experienced researcher at Ericsson from 2015 to 2016. He visited ARNG at the University of Southern California, Los Angeles, CA, USA, from 2017 to 2018. He worked as a Wireless Signal Processing Scientist at Intel Labs, Hillsboro, OR, USA, in 2018. His current research interests include URLLC in wireless networked control systems and signal processing in MIMO systems. He serves as a TPC member for IEEE INFOCOM, ICC, GLOBECOM, and WCNC. He received the ITC Rising Scholar Award in 2020, the Best Paper Award at the IEEE ICC 2020, the best In-Session Presentation Award of IEEE INFOCOM 2019, and the Exemplary Reviewer Award of IEEE WCL in 2019. He serves as an Associate Editor for the IEEE/KICS Journal of Communications and Networks, and a Guest Editor for the IEEE IoT Journal.

Biography

Zhiyuan Jang

Zhiyuan Jang [S’12-M’15] (jiangzhiyuan@shu.edu.cn) received the B.S. and Ph.D. degrees from the Electronic Engineering Department, Tsinghua University, China, in 2010 and 2015, respectively. He is currently a Professor at the School of Communication and Information Engineering, Shanghai University, Shanghai, China. He visited the WiDeS Group, University of Southern California, Los Angeles, CA, USA, from 2013 to 2014. He worked as an experienced researcher at Ericsson from 2015 to 2016. He visited ARNG at the University of Southern California, Los Angeles, CA, USA, from 2017 to 2018. He worked as a Wireless Signal Processing Scientist at Intel Labs, Hillsboro, OR, USA, in 2018. His current research interests include URLLC in wireless networked control systems and signal processing in MIMO systems. He serves as a TPC member for IEEE INFOCOM, ICC, GLOBECOM, and WCNC. He received the ITC Rising Scholar Award in 2020, the Best Paper Award at the IEEE ICC 2020, the best In-Session Presentation Award of IEEE INFOCOM 2019, and the Exemplary Reviewer Award of IEEE WCL in 2019. He serves as an Associate Editor for the IEEE/KICS Journal of Communications and Networks, and a Guest Editor for the IEEE IoT Journal.

References

  • 1 D. Tse and P. Viswanath, Fundamentals of wireless communication, Cambridge university press, 2005.doi:[[[10.1017/cbo9780511807213]]]
  • 2 M. Vu and A. Paulraj, "MIMO wireless linear precoding," IEEE Signal Process. Mag., vol. 24, no. 5, pp. 86-105, 2007.doi:[[[10.1109/msp.2007.904811]]]
  • 3 M.-C. Lee, W.-H. Chung, and T.-S. Lee, "Generalized precoder design formulation and iterative algorithm for spatial modulation in MIMO systems with CSIT," IEEE Trans. Commun., vol. 63, no. 4, pp. 1230-1244, 2015.doi:[[[10.1109/tcomm.2015.2396521]]]
  • 4 H. Yin, H. Wang, Y . Liu, and D. Gesbert, "Addressing the curse of mobility in massive MIMO with prony-based angular-delay domain channel predictions," IEEE J. Sel. Areas Commun., vol. 38, no. 12, pp. 2903-2917, 2020.doi:[[[10.1109/jsac.2020.3005473]]]
  • 5 K. T. Truong and R. W. Heath, "Effects of channel aging in massive MIMO systems," J. Commun. Netw., vol. 15, no. 4, pp. 338-351, 2013.doi:[[[10.1109/jcn.2013.000065]]]
  • 6 S. Kashyap, C. Mollén, E. Björnson, and E. G. Larsson, "Performance analysis of (TDD) massive MIMO with kalman channel prediction," in Proc. IEEE ICASSP, 2017.doi:[[[10.1109/icassp.2017.7952818]]]
  • 7 A. Khrwat, B. Sharif, C. Tsimenidis, and S. Boussakta, "Channel prediction for precoded spatial multiplexing multiple-input multipleoutput systems in time-varying fading channels," IET Signal Process., vol. 3, no. 6, pp. 459-466, 2009.doi:[[[10.1049/iet-spr.2009.0019]]]
  • 8 V . Arya and K. Appaiah, "Kalman filter based tracking for channel aging in massive MIMO systems," in Proc. IEEE SPCOM, 2018, pp. 362-366.doi:[[[10.1109/spcom.2018.8724442]]]
  • 9 K. E. Baddour and N. C. Beaulieu, "Improved pilot-assisted prediction of unknown time-selective rayleigh channels," in Proc. IEEE ICC, 2006.doi:[[[10.1109/icc.2006.255490]]]
  • 10 I. C. Wong and B. L. Evans, "Wlc43-5: Low-complexity adaptive high-resolution channel prediction for OFDM systems," in Proc. IEEE GLOBECOM, 2006.doi:[[[10.1109/glocom.2006.869]]]
  • 11 R. Adeogun, P. Teal, and P. Dmochowski, "Parametric schemes for prediction of wideband MIMO wireless channels," arXiv preprint arXiv:1408.0581, 2014.custom:[[[https://arxiv.org/abs/1408.0581]]]
  • 12 C. Luo, J. Ji, Q. Wang, X. Chen, and P. Li, "Channel state information prediction for 5G wireless communications: A deep learning approach," IEEE Trans. Netw. Sci. Eng., vol. 7, no. 1, pp. 227-236, 2018.doi:[[[10.1109/tnse.2018.2848960]]]
  • 13 S. Moon, H. Kim, and I. Hwang, "Deep learning-based channel estimation and tracking for millimeter-wave vehicular communications," J. Commun. Netw., vol. 22, no. 3, pp. 177-184, 2020.doi:[[[10.1109/jcn.2020.000012]]]
  • 14 E. U. T. R. Access, "Physical channels and modulation, 3GPP TS 36.211," V10, vol. 2, 2011.custom:[[[https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=2425]]]
  • 15 R. Baraniuk, M. Davenport, R. DeV ore, and M. Wakin, "A simple proof of the restricted isometry property for random matrices," Constructive Approx., vol. 28, no. 3, pp. 253-263, 2008.doi:[[[10.1007/s00365-007-9003-x]]]
  • 16 B. K. Natarajan, "Sparse approximate solutions to linear systems," SIAM J. Comput., vol. 24, no. 2, pp. 227-234, 1995.doi:[[[10.1137/s0097539792240406]]]
  • 17 S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Rev., vol. 43, no. 1, pp. 129-159, 2001.doi:[[[10.1137/s1064827596304010]]]
  • 18 J. A. Tropp and A. C. Gilbert, "Signal recovery from random measurements via orthogonal matching pursuit," IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655-4666, 2007.doi:[[[10.1109/tit.2007.909108]]]
  • 19 T. Blumensath and M. E. Davies, "Iterative hard thresholding for compressed sensing," Appl. Comput. Harmonic Analysis, vol. 27, no. 3, pp. 265-274, 2009.doi:[[[10.1016/j.acha.2009.04.002]]]
  • 20 D. Calvetti and L. Reichel, "Tikhonov regularization of large linear problems," BIT Numer. Math., vol. 43, no. 2, pp. 263-283, 2003.doi:[[[10.1023/A:1026083619097]]]
  • 21 L. Liu et al., "The COST 2100 MIMO channel model," IEEE Wireless Commun., vol. 19, no. 6, pp. 92-99, 2012.doi:[[[10.1109/mwc.2012.6393523]]]
  • 22 V . Michel and R. Telschow, "The regularized orthogonal functional matching pursuit for ill-posed inverse problems," SIAM J. Numer. Analysis, vol. 54, no. 1, pp. 262-287, 2016.doi:[[[10.1137/141000695]]]
  • 23 A. Fannjiang and W. Liao, "Coherence pattern-guided compressive sensing with unresolved grids," SIAM J. Imaging Sci., vol. 5, no. 1, pp. 179-202, 2012.custom:[[[https://arxiv.org/abs/1106.5177]]]
  • 24 J. Chen and X. Huo, "Theoretical results on sparse representations of multiple-measurement vectors," IEEE Trans. Signal Process., vol. 54, no. 12, pp. 4634-4643, 2006.doi:[[[10.1109/tsp.2006.881263]]]
  • 25 M. E. Davies and Y . C. Eldar, "Rank awareness in joint sparse recovery," IEEE Trans. Inf. Theory, vol. 58, no. 2, pp. 1135-1146, 2012.doi:[[[10.1109/tit.2011.2173722]]]
  • 26 J. Ying et al., "Vandermonde factorization of hankel matrix for complex exponential signal recovery—application in fast NMR spectroscopy," IEEE Trans. Signal Process., vol. 66, no. 21, pp. 5520-5533, 2018.doi:[[[10.1109/tsp.2018.2869122]]]
  • 27 J.-F. Cai, E. J. Candès, and Z. Shen, "A singular value thresholding algorithm for matrix completion," SIAM J. Optim., vol. 20, no. 4, pp. 1956-1982, 2010.doi:[[[10.1137/080738970]]]
  • 28 T. Chunchang, Y . Wenqian, F. Liang, W. Zhijie, and W. Yafeng, "On the performance of eigen based beamforming in LTE-advanced," in Proc. IEEE PRIMRC, 2009.doi:[[[10.1109/pimrc.2009.5450334]]]
  • 29 D. T. Mirikitani and N. Nikolaev, "Recursive bayesian recurrent neural networks for time-series modeling," IEEE Trans. Neural Netw., vol. 21, no. 2, pp. 262-274, 2009.doi:[[[10.1109/tnn.2009.2036174]]]
  • 30 R. Venkataramani and Y . Bresler, "Optimal sub-nyquist nonuniform sampling and reconstruction for multiband signals," IEEE Trans. Signal Process., vol. 49, no. 10, pp. 2301-2313, 2001.doi:[[[10.1109/78.950786]]]

TABLE I

COST2100 SETTINGS.
Parameter Value
Channel type Semi-urban-300M
Center frequency 3.5 GHz
[TeX:] $$|\Gamma|$$ 25
[TeX:] $$P$$ 5
Bandwidth 20 MHz
[TeX:] $$N_t$$ 32 (4 × 8 UPA)
[TeX:] $$N_{\mathrm{r}}$$ 4 ULA (single-user) or 2 ULA (multi-user)
BS position [0, 0, 10] m
User velocity 60 km/h or 120 km/h
User number 1 or 4
Single-user position [200, 400, 0] m
Multi-user position [200,−400, 0] m [100,−500, 0] m
[300,−300, 0] m [400,−200, 0] m

TABLE II

COMPARISON OF TNMSE BETWEEN UNIFORM AND NON-UNIFORM PILOTS AT DIFFERENT VELOCITIES.
60 km/h 120 km/h
Even pilot 0.8234 1.0082
Non-uniform pilot 0.5298 0.6313
Random pilot 0.5844 0.6983
In the uplink period, the yellow strip represents the pilot slot, the green strip represents the non-pilot slot and CSI can be measured on pilot slots and forms an observation window whose size is W, the channel predictor can predict future CSI for precoding in downlink subframe which is represented by red strips.
This figure shows the non-uniform pilot pattern design, the minimum interval of pilot is denoted by [TeX:] $$\Delta T_{\min }$$ and the density of pilot is remain the same as in uniform case.
The left and right figures show the variation of amplitude and phase of observed CSI and predicted CSI respectively, the yellow frame is the observation part and the red one is the predicted part. It can be seen from the graph that the prediction is the duplication of the observation because of the cyclic property of DFT.
Redundant dictionary OMP-based angular delay domain channel prediction
Rank-aware based OMP channel prediction with band-excluded redundant dictionary
Matrix completion based channel prediction method
TNMSE vs. channel measurement noise level in 60 km/h and 120 km/h user speed.
Average throughput loss vs. SNR, noise-free channel measurements, [TeX:] $$N_t=32$$.
Average throughput loss vs. SNR, noise-free channel measurements with channel processing delay, [TeX:] $$N_t=32$$.
Channel sounding route and transmitter location.
Average throughput loss vs. SNR in different velocities, noise-free channel measurements, 4 UEs, [TeX:] $$N_t=32$$.
Average throughput loss versus SNR in real measured channel case.