Yuchao Chen* , Haoyue Tang* , Jintao Wang , Pengkun Yang and Leandros TassiulasSampling for Remote Estimation of an Ornstein-Uhlenbeck Process through Channel with Unknown Delay StatisticsAbstract: In this paper, we consider sampling an OrnsteinUhlenbeck(OU)processthroughachannelforremoteestimation. The goal is to minimize the mean square error (MSE) at the estimator under a sampling frequency constraint when the channel delay statistics is unknown. Sampling for MSE minimization is reformulated into an optimal stopping problem. By revisiting the threshold structure of the optimal stopping policy when the delay statistics is known, we propose an online samplingalgorithmtolearntheoptimumthresholdusingstochastic approximation algorithm and the virtual queue method. We prove that with probability 1, the MSE of the proposed online algorithm converges to the minimum MSE that is achieved when the channel delay statistics is known. The cumulative MSE gap of our proposed algorithm compared with the minimum MSE up to the (k+1)th sample grows with rate at mostO(lnk). Our proposed online algorithm can satisfy the sampling frequency constraint theoretically. Finally, simulation results are provided to demonstrate the performance of the proposed algorithm. Keywords: Online learning , Ornstein-Uhlenbeck process , stochastic approximation I. INTRODUCTIONWITH the rapid development of the autonomous vehicles [1] and intelligent machine communications [2], status update information (e.g., the speed of the vehicles) is becoming a major part in future communication networks [3]. Those status information are delivered to the destination through communication channels, and to guarantee the system safety and efficient control, it is necessary to ensure that the controller has an accurate estimation of the system state. To measure the information freshness at the destination, the metric, age of information (AoI), has been proposed in [4]. According to the definition, AoI measures the difference between the current time and the generation time of the latest information received at the destination. Previous work [5], [6] have shown that AoI minimization is different from the traditional throughput and delay optimization. Specifically in the data generation procedure, a new data sample should be made only when the data stored at the destination is old. Numerous research have been conducted to minimize the AoI in various networks [4]–[11]. The average AoI optimization in the queueing system is studied in [4], [7]. Age-optimal scheduling policies in a multi-user wireless network are also investigated in [9]–[12]. For minimizing the more general nonlinear age function, [6], [8] also design the optimal sampling strategies. However, when the signal model is known, AoI itself cannot reflect the different signal evolution. As an alternative, a better metric to capture information freshness at the destination is the mean square error (MSE) [13]–[21]. The sampling strategy to minimize the estimation MSE of a Wiener process is studied in [14], [15], [20]. Sampling strategy to minimize an Ornstein-Uhlenbeck (OU) process is investigated in [14], [21]. It is revealed that the optimum sampling threshold depends on signal evolution and channel delay statistics. When the channel delay statistics is known, the aforementioned optimum sampling thresholds can be computed numerically by fixedpoint iteration [19] or bi-section search [20], [21]. When the channel statistics of the communication link is unknown, finding the optimum policy (i.e., the optimum AoI [6] or signal difference threshold [20], [21]) is challenging. Designing an adaptive sampling and transmission strategy under unknown channel statistics for data freshness optimization can be formulated into a sequential decisionmaking process [22]–[29]. Based on the stochastic multiarmed bandit, [22]–[24] design online channel selection algorithms to minimize average AoI performance for the ONOFF channel with unknown transition probability. For channels with more efficient communication protocols, [30]–[32] use reinforcement learning to minimize the AoI performance under unknown channel statistics. For communication channels with random delay, [28], [29], [33] apply the stochastic approximation method to design adaptive sampling algorithms to optimize AoI performance. The stochastic approximation method can also be extended to online estimation of the signals with simple evolution model, i.e., the Wiener process [34]. Notice that the Wiener process is the simplest time-varying signal model, and we are interested in extending the results to handle more general and complex signal models. In this paper, we consider a point-to-point link with a sensor sampling an OU process and transmitting the sampled packet to the destination through a channel with random delay for remote estimation. Our goal is to design an online sampling policy to minimize the average MSE under a frequency constraint when the channel statistics is unknown. The main contributions of the work are listed as follows: · We reformulated the MSE minimum sampling problem under the unknown channel statistics as an optimal stopping problem by providing a novel frame division algorithm that is different from [21]. This novel approach of frame division enables us to propose an online sampling algorithm to learn the optimal threshold adaptively through stochastic approximation and virtual queue method. · When there is no sampling frequency constraint, we proved that the expected average MSE of the proposed algorithm can converge to the minimum MSE almost surely. Specifically, we first utilized the property of the OU process to bound the threshold parameter (Lemma 2 and Lemma 6), and then we proved the cumulative MSE regret grows at the speed of [TeX:] $$\mathcal{O}(\ln K),$$ where K is the number of samples (Theorem 2) we have taken. · When there exists a sampling frequency constraint, by viewing the sampling frequency debt as a virtual queue, we proved that the sampling frequency constraint can be satisfied in the sense that the virtual queue is stable (Theorem 3). The rest of the paper is organized as follows. In Section II, we introduce the system model and formulate the MSE minimization problem. In Section III, we reformulate the problem into an optimal stopping optimization and then propose an online sampling algorithm. The theoretical analysis of the proposed algorithm is provided in Section IV. In Section V, we present the simulation results. Finally, conclusions are drawn in Section VI. II. PROBLEM FORMULATIONA. System ModelAs depicted in Fig. 1, we study a status update system similar to [21], where a sensor observes a time-varying process and sends the sampled data to the remote estimator through a channel. Let [TeX:] $$X_t \in \mathbb{R}, \forall t \geq 0$$ denote the value of the time-varying process at time t. To model these time-varying first-order auto-regressive processes, we assume [TeX:] $$X_t$$ to be an OU process in this work. This general process is the only nontrivial continuous-time process that is stationary, Gaussian, and Markovian [35]. The OU process evolution parameterized by [TeX:] $$\mu, \theta, \sigma \in \mathbb{R}^{+}$$ can be modeled by the following stochastic differential equation (SDE) [35]:
where [TeX:] $$W_t$$ is a Wiener process. Suppose the sensor can sample the process at any time [TeX:] $$t \in \mathbb{R}^{+}$$ at his own will. Let [TeX:] $$S_k$$ be the sampling time-stamp of the kth sample. Once sample k is transmitted over the channel, it will experience a random delay [TeX:] $$D_k \in[0, \infty)$$ to reach the destination. We assume the transmission delay is independent and identically distributed (i.i.d.) following a probability measure [TeX:] $$\mathbb{P}_D.$$ Due to the interference constraint, only one sample can be transmitted over the channel at one time. Once the transmission of an update finishes, an ACK signal will be sent to the sensor without error immediately. Let [TeX:] $$R_k$$ be the reception time of the kth sample. Then we can compute [TeX:] $$R_k$$ iteratively by
B. Minimum Mean Squared Error (MMSE) EstimationThe receiver attempts to estimate the value of [TeX:] $$X_t$$ based on the received packets and the transmission results before time t. Let [TeX:] $$i(t)=\max _{k \in \mathbb{N}}\left\{k \mid R_k \leq t\right\}$$ be the index of the latest received sample at time t. The evolution of [TeX:] $$X_t$$ can be rewritten using the strong Markov property of the OU process [21, equation (8)] as follows.
(2)[TeX:] $$\begin{aligned} X_t= & X_{S_{i(t)}} e^{-\theta\left(t-S_{i(t)}\right)}+\mu\left[1-e^{-\theta\left(t-S_{i(t)}\right)}\right] \\ & +\frac{\sigma}{\sqrt{2 \theta}} e^{-\theta\left(t-S_{i(t)}\right)} W_{e^{2 \theta\left(t-S_{i(t)}\right)}-1} \end{aligned}$$Let [TeX:] $$\mathcal{H}_t:=\left(\left\{S_k, D_k, X_{S_k}\right\}_{k=1}^{i(t)}, t\right)$$ be the historical information up to time t. Then, the MMSE estimator at the destination is the conditional expectation [36]:
(3)[TeX:] $$\hat{X}_t=\mathbb{E}\left[X_t \mid \mathcal{H}_t\right]=X_{S_{i(t)}} e^{-\theta\left(t-S_{i(t)}\right)}+\mu\left[1-e^{-\theta\left(t-S_{i(t)}\right)}\right]$$Combined with (2), the instant estimation error at time t, denoted by [TeX:] $$\Delta_t$$ can be computed as
(4)[TeX:] $$\Delta_t=X_t-\hat{X}_t=\frac{\sigma}{\sqrt{2 \theta}} e^{-\theta\left(t-S_{i(t)}\right)} W_{e^{2 \theta\left(t-S_{i(t)}\right)}-1},$$which can be viewed as an OU process starting at time [TeX:] $$t=S_{i(t)}.$$ To better demonstrate the MMSE estimation, we draw Fig. 2 as an example. The blue line is a sample path of an OU process, and the orange line is the MMSE estimator computed by (3). Then the difference between these two lines, i.e., the shaded area, is the cumulative estimation error between the two samples. C. Optimization ProblemThe goal of the sampler is to find a sampling policy represented by a series of sampling times, i.e., [TeX:] $$\pi:=\left\{S_1, S_2, \cdots\right\}$$ to minimize the estimation MSE of the OU process at the destination. We assume that the sampler knows the statistical information of the OU process, i.e., parameters [TeX:] $$\theta, \mu, \sigma$$, while the channel delay statistics [TeX:] $$\mathbb{P}_D$$ is unknown. Here we focus on the set of causal sampling policies denoted by Π. The sampling time [TeX:] $$S_k$$ selected by each policy [TeX:] $$\pi \in \Pi$$ is determined only by the historical information. No future information can be used for the sampling decision. Moreover, due to the hardware constraint and energy conservation, the average sampling frequency during the transmission should be below a certain threshold [TeX:] $$f_{\max }.$$ Then, the optimization problem can be formulated as Problem 1 (MSE minimization):
(5a)[TeX:] $$\text { mmse } \triangleq \inf _{\pi \in \Pi} \limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\int_0^T\left(X_t-\hat{X}_t\right)^2 \mathrm{~d} t\right],$$
III. PROBLEM RESOLUTIONIn this section, we first reformulate the Problem 1 into an optimal stopping problem. Then, an online sampling algorithm is proposed to approach the optimal mmse. A. Optimal Stopping Problem ReformulationNotice that Problem 1 is a constrained continuous-time Markov decision process (MDP) with a continuous state space. It has been proven in [21, Lemma 6] that it is sub-optimal to take a new sample before the last packet is received by the receiver. In other words, to achieve the optimal mmse, the sampling time-stamp [TeX:] $$S_k$$ should be larger than [TeX:] $$R_{k-1}.$$ Then (1) can be simplified as [TeX:] $$R_k=S_k+D_k.$$ Let [TeX:] $$W_k=S_{k+1}-R_k$$ be the waiting time before taking the (k +1)th sample. Then, designing a sampling policy [TeX:] $$\pi=\left\{S_1, S_2, \cdots\right\}$$ is equivalent to choosing a sequence of waiting time [TeX:] $$\left\{W_1, W_2, \cdots\right\}.$$ To facilitate further analysis, define frame k to be the time interval between [TeX:] $$S_k$$ and [TeX:] $$S_{k+1}.$$ Then, we introduce the following lemma to reformulate the Problem 1 into the packet-level MDP. Lemma 1: Define [TeX:] $$\mathcal{I}_k=\left(D_k,\left\{X_t\right\}_{t \geq S_k}\right)$$ to be the information in frame k, and [TeX:] $$\Pi_r$$ to be the set of stationary sampling policies whose [TeX:] $$W_k$$ only depends on [TeX:] $$\mathcal{I}_k.$$ Let D be the random delay following distribution [TeX:] $$\mathbb{P}_D.$$ Then Problem 1 can be reformulated into the following MDP: Problem 2 (Packet-level MDP reformulation):
(6a)[TeX:] $$\alpha^{\star} \triangleq \sup _{\pi \in \Pi_r}\left(\lim _{K \rightarrow \infty} \frac{\sum_{k=1}^K \mathbb{E}\left[O_{D_k+W_k}^2\right]}{\sum_{k=1}^K \mathbb{E}\left[D_k+W_k\right]}\right),$$
(6b)[TeX:] $$\text { s.t. } \liminf _{K \rightarrow \infty} \frac{1}{K} \sum_{k=1}^K \mathbb{E}\left[W_k+D_k\right] \geq \frac{1}{f_{\max }},$$where [TeX:] $$O_t$$ is an OU process with initial state [TeX:] $$O_t=0$$ and parameter μ = 0, which is the solution to the SDE:
Moreover, the optimum value [TeX:] $$\alpha^{\star}$$ satisfies:
(8)[TeX:] $$\alpha^{\star}=\left(\frac{\sigma^2}{2 \theta}-\text { mmse }\right) \frac{2 \theta}{\mathbb{E}\left[e^{-2 \theta D}\right]} \geq 0.$$The proof of Lemma 1 is provided in Appendix B. Assumption 1: The expectation of delay [TeX:] $$D_k$$ is bounded and known to the transmitter, i.e.,
(9)[TeX:] $$0\lt D_{\mathrm{lb}} \leq \bar{D} \triangleq \mathbb{E}_{\mathbb{P}_D}\left[D_k\right] \leq D_{\mathrm{ub}}\lt \infty.$$Lemma 2: Define [TeX:] $$\hat{W}=1 / f_{\max }+c \text {, }$$ where c > 0 is an arbitrary constant. If Assumption 1 is satisfied, then we can bound [TeX:] $$\alpha^{\star}$$ as
where [TeX:] $$\alpha_{\mathrm{lb}} \text { and } \alpha_{\mathrm{ub}}$$ can be chosen as
(11)[TeX:] $$\alpha_{\mathrm{lb}}=\frac{\sigma^2\left(1-e^{-2 \theta \hat{W}}\right)}{2 \theta\left(D_{\mathrm{ub}}+\hat{W}\right)}\gt 0,$$
The proof of Lemma 2 is provided in Appendix C. The lower bound is obtained by constructing a feasible and constant sampling policy whose waiting time is always [TeX:] $$\hat{W}$$ and then using (6a). The constant c is introduced to ensure [TeX:] $$\hat{W} \gt 0$$ when there is no frequency constraint. The upper bound is obtained by using (8) and the fact [TeX:] $$\text { mmse } \geq \sigma^2 / 2 \theta \mathbb{E}\left[1-e^{-2 \theta D}\right].$$ B. Optimal Sampling with Known [TeX:] $$\mathbb{P}_D$$In the sequel, we will derive the optimum policy [TeX:] $$\pi^{\star}$$ that achieves optimal mmse when [TeX:] $$\mathbb{P}_D$$ is known. The structure of the optimal policy can help us design the algorithm under unknown channel statistics, and the average MSE obtained by [TeX:] $$\pi^{\star}$$ will be used to measure the performance of the proposed online learning algorithm in Section III-C According to (6a), the cost obtained by any policy π that satisfies the sampling constraint (6b) is less or equal to [TeX:] $$\alpha^{\star}$$. In other words, we have
(13)[TeX:] $$-\lim _{K \rightarrow \infty} \frac{\frac{1}{K} \sum_{k=1}^K \mathbb{E}\left[O_{D_k+W_k}^2\right]}{\frac{1}{K} \sum_{k=1}^K \mathbb{E}\left[D_k+W_k\right]} \geq-\alpha^{\star}.$$Multiplying [TeX:] $$(1 / K) \sum_{k=1}^K \mathbb{E}\left[D_k+W_k\right]$$ on both sides of (13) and then adding [TeX:] $$\alpha^{\star} \lim _{K \rightarrow \infty}(1 / K) \mathbb{E}\left[D_k+W_k\right]$$ on both sides, we are able to solve Problem 2 by minimizing the following objective function: Problem 3:
(14a)[TeX:] $$\begin{aligned} \rho^{\star} \triangleq & \inf _{\pi \in \Pi_r} \limsup _{K \rightarrow \infty} \frac{1}{K} \sum_{k=1}^K\left(-\mathbb{E}\left[O_{D_k+W_k}^2\right]\right. \\ & \left.+\alpha^{\star} \mathbb{E}\left[D_k+W_k\right]\right), \end{aligned}$$
(14b)[TeX:] $$\text { s.t. } \liminf _{K \rightarrow \infty} \frac{1}{K} \sum_{k=1}^K \mathbb{E}\left[W_k+D_k\right] \geq \frac{1}{f_{\max }} \text {, }$$Similar to Dinkelbach’s method [37] for the non-linear fractional programming, we can deduce that the optimal value [TeX:] $$\rho^{\star}$$ of Problem 3 equals 0, and the optimum policy that achieves mmse in Problem 1 and [TeX:] $$\rho^{\star}$$ in Problem 3 are identical. Therefore, we proceed to solve Problem 3 using the Lagrange multiplier approach. Let [TeX:] $$\lambda \geq 0$$ be the Lagrange multiplier of the sampling frequency constraint (14b), the Lagrange function for Problem 3 is as follows:
(15)[TeX:] $$\begin{aligned} \mathcal{L}(\pi, \lambda)= & \limsup _{K \rightarrow \infty} \frac{1}{K} \sum_{k=1}^K\left(-\mathbb{E}\left[O_{D_k+W_k}^2\right]\right. \\ & \left.+\left(\alpha^{\star}-\lambda\right) \mathbb{E}\left[D_k+W_k\right]+\lambda \frac{1}{f_{\max }}\right) . \end{aligned}$$Notice that the transmission delay [TeX:] $$D_k$$ is i.i.d., and [TeX:] $$O_k$$ is an OU process starting at time t = 0. Then for fixed λ, selecting the optimum waiting time [TeX:] $$W_k$$ to minimize (15) becomes a per-sample optimal stopping problem by finding the optimum stop time w to minimize the following expectation:
(16)[TeX:] $$\min _w \mathbb{E}\left[-O_{D_k+w}^2+\left(\alpha^{\star}-\lambda\right) w \mid O_{D_k}, D_k\right] .$$For simplicity, let [TeX:] $$V_w=O_{D_k+w}$$ be the value of the OU process at time [TeX:] $$D_k+w$$ and [TeX:] $$V_0=O_{D_k}$$ by definition. Then problem (16) is one instance of the following optimal stopping problem when [TeX:] $$\beta=\alpha^{\star}-\lambda:$$
where [TeX:] $$\mathbb{E}_{v_0}$$ is the conditional expectation given [TeX:] $$V_0=v_0 .$$ The optimum policy to (17) is obtained in the following Lemma: Lemma 3: If [TeX:] $$0\lt \beta \leq \sigma^2,$$ then the solution to minimize (17) has a threshold property, i.e.,
(18)[TeX:] $$W_k=w\left(O_{D_k} ; \beta\right):=\inf \left\{t \geq 0:\left|O_{D_k+t}\right| \geq v(\beta)\right\},$$where
and [TeX:] $$G^{-1}(\cdot)$$ is the inverse function of
The proof of Lemma 3 is provided in Appendix D. Since [21, Theorem 6] has proven the strong duality of Problem 3, i.e., [TeX:] $$\rho^{\star}=\max _\lambda \min \mathcal{L}(\pi, \lambda) \text {. }$$ For notational simplicity, let [TeX:] $$o(\beta) \text { and } l(\beta)$$ denote the expected estimation error and frame length by using threshold β, i.e.,
by substituting [TeX:] $$O_{D_k+w}$$ with [TeX:] $$\left(X_{R_k+w}-\hat{X}_{R_k+w}\right)$$ in (18), the optimal sampling time [TeX:] $$S_{k+1}=R_k+W_k$$ to Problem 3 is as follows: Lemma 4: [21, Theorem 2 Restated] The optimal solution to Problem 1 is:
[TeX:] $$S_{k+1}=\inf \left\{t \geq R_k:\left|X_t-\hat{X}_t\right| \geq v\left(\alpha^{\star}-\lambda^{\star}\right)\right\},$$ where [TeX:] $$v(\cdot)$$ is defined in (19), [TeX:] $$\lambda^{\star}=\arg \sup _\lambda \mathcal{L}(\pi, \lambda)$$ is the dual optimizer, and [TeX:] $$\alpha^{\star}$$ is the solution to the following equation:
(22)[TeX:] $$0=g_{\lambda^{\star}}(\alpha):=o\left(\alpha-\lambda^{\star}\right)-\alpha l\left(\alpha-\lambda^{\star}\right),$$where we recall that [TeX:] $$o(\beta)=\mathbb{E}\left[O_{D+w\left(O_D ; \beta\right)}^2\right]=\mathbb{E}\left[\left(X_{S_{k+1}}-\hat{X}_{S_{k+1}}\right)^2\right]$$ is the expected squared estimation error by using threshold β, and [TeX:] $$l(\beta)=\mathbb{E}\left[D+w\left(O_D ; \beta\right)\right]$$ is the expected framelength. Remark 1: If the frequency constraint is inactive, then according to the complementary slackness, we have [TeX:] $$\lambda^{\star}=0,$$ and the threshold becomes [TeX:] $$v\left(\alpha^{\star}\right) .$$ Otherwise, the optimal [TeX:] $$\alpha^{\star}-\lambda^{\star}\lt \alpha^{\star} \text {. }$$ Then according to (19), the sampling threshold is larger than [TeX:] $$v\left(\alpha^{\star}\right)$$ to satisfy the sampling frequency constraint. Remark 2: In [21, Theorem 2], the optimum sampling threshold to minimize the MSE is
(23)[TeX:] $$v\left(\beta^{\prime}\right)=\frac{\sigma}{\sqrt{\theta}} G^{-1}\left(\frac{\mathrm{mse}_{\infty}-\mathrm{mse}_D}{\mathrm{mse}_{\infty}-\beta^{\prime}}\right),$$where
(24a)[TeX:] $$\text { mse}_{\infty}=\mathbb{E}\left[O_{\infty}^2\right]=\frac{\sigma^2}{2 \theta};$$
(24b)[TeX:] $$\operatorname{mse}_D=\mathbb{E}\left[O_{D_k}^2\right]=\frac{\sigma^2}{2 \theta} \mathbb{E}\left[1-e^{-2 \theta D}\right].$$The optimum sampling threshold is taken when [TeX:] $$\beta^{\prime}=\text { mmse }+\lambda^{\prime} \text {, }$$ i.e.,
(25)[TeX:] $$\begin{gathered} v\left(\beta^{\prime}\right)=\frac{\sigma}{\sqrt{\theta}} G^{-1}\left(\frac{\sigma^2}{\left(\frac{\sigma^2}{2 \theta}-\text { mmse }\right) \frac{2 \theta}{\mathbb{E}\left[e^{-2 \theta D}\right]}-\lambda^{\prime} \frac{2 \theta}{\mathbb{E}\left[e^{-2 \theta D}\right]}}\right) \\ \stackrel{(a)}{=} \frac{\sigma}{\sqrt{\theta}} G^{-1}\left(\frac{\sigma^2}{\alpha^{\star}-\lambda^{\prime} \frac{2 \theta}{\mathbb{E}\left[e^{-2 \theta D}\right]}}\right), \end{gathered}$$where (a) holds by (8). Comparing (25) with (19), we find the conclusions coincide. C. Online AlgorithmNotice that the optimal sampling in Section III-B is determined by [TeX:] $$\alpha^{\star}-\lambda^{\star}$$ through (19). However, when the channel statistics [TeX:] $$\mathbb{P}_D$$ is unknown, [TeX:] $$\alpha^{\star} \text{ and } \lambda^{\star}$$ are unknown, making direct computation of [TeX:] $$v\left(\alpha^{\star}-\lambda^{\star}\right)$$ impossible. To overcome the challenge, we propose an online learning algorithm to approximate these two parameters [TeX:] $$\alpha^{\star} \text{ and } \lambda^{\star}$$ respectively. Notice that [TeX:] $$\alpha^{\star}$$ is the solution to (22) when [TeX:] $$\lambda=\lambda^{\star} \text {. }$$ This motivates us to approximate [TeX:] $$\alpha^{\star}$$ using the Robbins-Monro algorithm [38] for stochastic approximation. For [TeX:] $$\lambda^{\star}$$, we construct a virtual queue [TeX:] $$U_k$$ to record the cumulative sampling constraint violation up to frame k. As concluded in Algorithm 1, the proposed algorithm consists of two parts: Sampling (step 5) and updating (step 6 and 7). For the sampling step, the algorithm uses the current estimation [TeX:] $$\alpha_k \text { and } \lambda_k$$ to compute the threshold, i.e.,
(28)[TeX:] $$W_k=\inf \left\{w \geq 0:\left|X_{R_k+w}-\hat{X}_{R_k+w}\right| \geq v\left(\left(\alpha_k-\lambda_k\right)^{+}\right)\right\}$$where [TeX:] $$(\cdot)^{+}=\max \{\cdot, 0\}.$$ After sample (k + 1) is taken at time [TeX:] $$R_k+W_k,$$ we can compute the instant estimation error [TeX:] $$O_{L_k}:=X_{S_{k+1}}-\hat{X}_{S_{k+1}}$$ and the frame length [TeX:] $$L_k:=D_k+W_k \text {. }$$ According to (4), [TeX:] $$O_{L_k}$$ is an instance of [TeX:] $$O_{D+w\left(O_D ; \alpha-\lambda\right)}$$ when [TeX:] $$\lambda=\lambda_k \text { and } \alpha=\alpha_k \text {. }$$ We then update [TeX:] $$\alpha_{k+1}$$ according to the Robbins-Monro algorithm:
(29)[TeX:] $$\alpha_{k+1}=\left(\alpha_k+\eta_k\left(O_{L_k}^2-\alpha_k L_k\right)\right)_{\alpha_{\mathrm{lb}}}^{\alpha_{\mathrm{ub}}},$$where [TeX:] $$(x)_a^b$$ is the projection of x onto the interval [TeX:] $$[a, b] ; \alpha_{\mathrm{lb}} \text{ and } \alpha_{\mathrm{ub}}$$ are the lower and upper bound of [TeX:] $$\alpha^{\star}$$ defined in (11) and (12); [TeX:] $$\eta_k$$ is the step size, which can be chosen as
[TeX:] $$\eta_k= \begin{cases}\frac{1}{2 D_{\mathrm{lb}}}, & k=1; \\ \frac{1}{(k+2) D_{\mathrm{lb}}}, & k \geq 2.\end{cases}$$ For estimating [TeX:] $$\lambda^{\star}$$, we construct a virtual queue [TeX:] $$U_k$$ which evolves as
Then [TeX:] $$\lambda_k=U_k / V$$, where V > 0 is the hyper-parameter. Notice that [TeX:] $$1 / f_{\max }-L_k$$ is the violation of sampling constraint in frame k. Therefore [TeX:] $$U_k$$ can be interpreted as the cumulative violation up to frame k. The Algorithm 1 attempts to stabilize [TeX:] $$U_k$$ to satisfy the sampling frequency constraint. Remark 3: In (28), we choose [TeX:] $$\left(\alpha_k-\lambda_k\right)^{+}$$ to ensure the positive input for [TeX:] $$v(\cdot).$$ We should also avoid the estimation [TeX:] $$\alpha_k-\lambda_k$$ to be zero, which will make the threshold v to be infinite. This requires the algorithm cannot choose V to be too small. Also in practice one can set an arbitrarily small positive value η > 0 as a lower bound for [TeX:] $$\alpha_k-\lambda_k$$ to avoid the infinite threshold. IV. THEORETICAL ANALYSISIn this section, we analyze the convergence and optimality of Algorithm 1. Assumption 2: The second moment of delay [TeX:] $$D_k$$ is bounded, i.e., 1 1 The assumptions is presented here mainly for theoretical analysis. In fact the proposed algorithm discussed in Section III-C does not need the assumption.
(30a)[TeX:] $$0\lt M_{\mathrm{lb}} \leq \mathbb{E}_{\mathbb{P}_D}\left[D_k^2\right] \leq M_{\mathrm{ub}}\lt \infty.$$First, we assume that there is no sampling frequency constraint, i.e., [TeX:] $$f_{\max }=\infty$$ and thus λ = 0. Finally, we will prove that in general case [TeX:] $$f_{\max }\lt \infty$$, Algorithm 1 will still satisfy the constraint. Theorem 1: The time average MSE [TeX:] $$\frac{\int_0^{S_{k+1}}\left(X_t-\hat{X}_t\right)^2 \mathrm{~d} t}{S_{k+1}}$$ of the proposed online learning algorithm converges to mmse with probability 1, i.e.,
(31)[TeX:] $$\frac{\int_0^{S_{k+1}}\left(X_t-\hat{X}_t\right)^2 \mathrm{~d} t}{S_{k+1}} \stackrel{\text { a.s. }}{=} \text { mmse. }$$Theorem 2: Let [TeX:] $$\mathcal{R}_k:=\mathbb{E}\left[\int_0^{S_{k+1}}\left(X_t-\hat{X}_t\right)^2 \mathrm{~d} t\right]-\text { mmse} \cdot \mathbb{E}\left[S_{k+1}\right]$$ denote the expected cumulative MSE regret up to the (k + 1)th sample. We can upper bound [TeX:] $$\mathcal{R}_k$$ as follows:
(32)[TeX:] $$\mathcal{R}_k \leq \max _{\alpha \in\left[\alpha_{\mathrm{lb}}, \alpha_{\mathrm{ub}}\right]}\left|R_1^{\prime}(v(\alpha)) v^{\prime}(\alpha)\right| \frac{\mathbb{E}\left[e^{-2 \theta D}\right]}{2 \theta} \frac{C}{D_{\mathrm{lb}}^2} \ln k,$$where C is a constant independent of k and is defined (42). The proof of Theorem 1 and Theorem 2 are provided in Appendix E and Appendix F, respectively. Now we consider the sampling frequency constraint. Here we assume that the constraint is feasible, i.e., Assumption 3: There exists a constant [TeX:] $$\epsilon\gt 0,$$ and a stationary sampling policy [TeX:] $$\pi_\epsilon$$ satisfies
where the expectation is taken over the channel statistics and the policy [TeX:] $$\pi_\epsilon$$. Theorem 3: Under Algorithm 1, the sampling frequency constraint can be satisfied, i.e.,
(34)[TeX:] $$\lim _{K \rightarrow \infty} \inf \mathbb{E}\left[\frac{1}{K} \sum_{k=1}^K\left(D_k+W_k\right)\right] \geq \frac{1}{f_{\max }}.$$The proof of Theorem 3 is provided in Appendix H. V. SIMULATION RESULTSIn this section, we provide some simulation results to demonstrate the performance of our proposed algorithm. The parameters of the monitored OU process are σ = 1, θ = 0.2, and μ = 3. The channel delay follows the log-normal distribution with [TeX:] $$\mu_D=\sigma_D=1 .$$ The expected MSE is computed by taking the average of 100 simulation runs for [TeX:] $$K=10^4$$ packet transmission frames. A. Without a Sampling Frequency ConstraintFirst, we consider the case with no frequency constraint, i.e., [TeX:] $$f_{\max }=\infty$$. We compare the MSE performance using the following policies: · Zero-Wait Policy [TeX:] $$\pi_{\mathrm{ZW}}:$$ Take a new sample immediately after the reception of the ACK of the last sample, i.e., [TeX:] $$W_k=0 .$$ · Signal-Aware MSE Optimum Policy [TeX:] $$\pi^{\star}:$$ Signal aware MSE optimum policy when [TeX:] $$\mathbb{P}_D$$ is known [21]. · Signal-Agnostic AoI Minimum Policy [TeX:] $$\pi_{\mathrm{AoI}}:$$ Signal agnostic sampling policy for AoI minimization [6]. · Proposed Online Policy [TeX:] $$\pi_{\text {online }}:$$ Described in Algorithm 1. The estimation performance is depicted in Fig. 3. From Fig. 3, we can verify that the expected MSE performance of the proposed policy [TeX:] $$\pi_{\text {online }}$$ converges to the optimum policy [TeX:] $$\pi^{\star}$$, and achieves a smaller MSE performance compared with the signal-agnostic AoI minimum sampling and zero-wait policy. Previous work [21] has shown that the zero-wait policy is far from optimality when the channel delay is heavy tail. For the AoI optimal policy, while [20] reveals the relationship between average AoI and estimation error for the Wiener process, it is sub-optimal for MSE optimization of the OU process, even worse than the zero-wait policy. Next, we consider the estimation of the threshold [TeX:] $$v\left(\alpha^{\star}-\lambda^{\star}\right).$$ Obviously, the fast and accurate estimation of the threshold is the necessary condition for the convergence of MSE performance. As depicted in Fig. 4, the proposed algorithm can approximate the optimal threshold as the time goes to infinity. Besides, the variance of the threshold estimation will also become small, which guarantees the convergence of MSE. B. With A Sampling Frequency ConstraintIn this part, we depict the simulation results when a sampling constraint exists. The parameters of the system are the same as in Fig. 3, and we set [TeX:] $$f_{\max }=0.02 .$$ In other words, the minimum average frame length [TeX:] $$1 / f_{\max }=50.$$ Notice that now the zero-wait policy does not satisfy the sampling constraint. Therefore, we consider a frequency conservative policy [TeX:] $$\pi_{\text {freq }},$$ which selects [TeX:] $$W_k$$ as
[TeX:] $$W_k=\max \left\{\frac{k}{f_{\max }}-\sum_{k^{\prime}=1}^{k-1} L_{k^{\prime}}-D_k, 0\right\} .$$ We set the parameter V = 500 and depict the MSE performance and average frame length in Fig. 5 and Fig. 6. These two figures verify that the proposed algorithm can also approximate the lower bound while satisfying the frequency constraint. Finally, we investigate the impact of V on the MSE performance and average frame length. We choose three different values of V = {300, 500, 800} and compare the MSE performance and average frame length, as depicted in Figs. 7(a) and 7(b) respectively. Generally speaking, the MSE performance of proposed algorithm with different V can all converge to the optimal MMSE, and the average inter-update interval of the proposed algorithms are near the frequency constraint. Notice that V is a hyper parameter controlling the estimation of the Lagrange multiplier. A larger V indicates less emphasis on the frequency constraint. By using a larger V = 800, the algorithm will take a longer time to converge to the sampling frequency constraint. Since for t < 8000 the sampling frequency of the algorithm slightly violates the sampling frequency constraint, the MSE is smaller. VI. CONCLUSIONIn this work, we studied the sampling policy for remote estimation of an OU process through a channel with transmission delay. We aim at designing an online sampling policy that can minimize the mean square error when the delay distribution is unknown. Finding the MSE minimum sampling policy can be reformulated into an optimal stopping problem, we proposed a stochastic approximation algorithm to learn the optimum stopping threshold adaptively. We prove that, after taking k samples, the cumulative MSE regret of our proposed algorithm grows with rate [TeX:] $$\mathcal{O}(\ln k)$$, and the expected time-averaged MSE of our proposed algorithm converges to the minimum MSE almost surely. Numerical simulation validates the superiority and convergence performance of the proposed algorithm. BiographyJintao WangJintao Wang (SM’12) received the B.Eng. and Ph.D. degrees in Electrical Engineering both from Tsinghua University, Beijing, China, in 2001 and 2006, respectively. From 2006 to 2009, he was an Assistant Professor in the Department of Electronic Engineering at Tsinghua University. Since 2009, he has been an Associate Professor and Ph.D. Supervisor. He is the Standard Committee Member for the Chinese national digital terrestrial television broadcasting standard. His current research interests include space-time coding, MIMO, and OFDM systems. He has published more than 100 journal and conference papers and holds more than 40 national invention patents. BiographyPengkun YangPengkun Yang received the B.E. degree from the Department of Electronic Engineering, Tsinghua University, in 2013, the M.S. and Ph.D. degrees from the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. He is currently an Assistant Professor with the Center for Statistical Science, Tsinghua University. His research interests include statistical inference, learning, and optimization and systems. He was a recipient of the Jack Keil Wolf ISIT Student Paper Award from the 2015 IEEE International Symposium on Information Theory. BiographyLeandros TassiulasLeandros Tassiulas (Fellow, IEEE) received the Ph.D. degree in Electrical Engineering from the University of Maryland, College Park, MD, USA, in 1991, and the Diploma degree in Electrical Engineering from the Aristotele University of Thessaloniki, Greece. He was a Faculty Member at the Polytechnic University, New York, NY , USA, University of Maryland, and University of Thessaly, Greece. He is currently the John C. Malone Professor of electrical engineering with Yale University, New Haven, CT, USA. His most notable contributions include the max-weight scheduling algorithm and the backpressure network control policy, opportunistic scheduling in wireless, the maximum lifetime approach for wireless network energy management, and the consideration of joint access control and antenna transmission management in multiple antenna wireless systems. He was worked in the field of computer and communication networks with emphasis on fundamental mathematical models and algorithms of complex networks, wireless systems and sensor networks. His current research interests include intelligent services and architectures at the edge of next generation networks including the Internet of Things, sensing and actuation in terrestrial, and non terrestrial environments. His research has been recognized by several awards, including the IEEE Koji Kobayashi Computer and Communications Award in 2016, the ACM SIGMETRICS achievement award 2020, the Inaugural INFOCOM 2007 Achievement Award for fundamental contributions to resource allocation in communication networks, the INFOCOM 1994 and 2017 Best Paper Awards, the National Science Foundation (NSF) Research Initiation Award in 1992, the NSF CAREER Award in 1995, the Office of Naval Research Young Investigator Award in 1997, and the Bodossaki Foundation Award in 1999. He is a several best paper awards including the INFOCOM 1994, 2017 and Mobihoc 2016. He is a Fellow of ACM in 2020. References |