Asymptotically Optimal Delay-aware Scheduling in Queueing Systems

Saad Kriouile , Mohamad Assaad and Maialen Larranaga

Abstract

Abstract: In this paper, we investigate a delay-aware channel allocation problem where the number of channels is less than that of users. Due to the proliferation of delay sensitive applications, the objective of our problem is chosen to be the minimization of the total average queuing delay of the network in question. First, we show that our problem falls in the framework of restless bandit problems (RBP), for which obtaining the optimal solution is known to be out of reach. To circumvent this difficulty, we tackle the problem by adopting a Whittle index approach. To that extent, we employ a Lagrangian relaxation for the original problem and prove it to be decomposable into multiple one-dimensional independent subproblems. Afterwards, we provide structural results on the optimal policy of each of the subproblems. More specifically, we prove that a threshold policy is able to achieve the optimal operating point of the considered subproblem. Armed with that, we show the indexability of the subproblems and characterize the Whittle’s indices which are the basis of our proposed heuristic. We then provide a rigorous mathematical proof that our policy is optimal in the infinitely many users regime. Finally, we provide numerical results that showcase the remarkable good performance of our proposed policy and that corroborate the theoretical findings.

I. INTRODUCTION

THIS paper deals with user and channel scheduling, which has been widely recognized as a means to improve the network performance and to meet the service demands of the users. This problem has been widely studied in the past and several allocation policies have been developed for various contexts (e.g., see [2]–[8] and the references therein). In 5G networks, the problem of channel and user scheduling will be receiving particular interest due to the increase in the number of devices and users. Furthermore, the applications nowadays do not need high data rates only but are also more delay-sensitive, which implies that minimizing the delay is considered as a main design metric in future networks.

In this paper, we consider the problem of scheduling and channel allocation in a discrete time system composed of one central scheduler serving multiple users or queues. We consider that the traffic arriving to each queue is time varying, and that the number of users is higher than the number of channels, which is a quite realistic assumption especially with the growth in density of users in today’s networks. At each time slot, the central scheduler decides to allocate the channels to users, where a channel can be seen as a server in wired networks or a frequency bandwidth in wireless networks. Throughout this paper, we will use the terms “channel” and “server” interchangeably to designate a resource to allocate to users. Furthermore, we assume that the number of channels is limited and each channel can only allocate to one user at each time slot. The objective in this case is to find an allocation policy that minimizes the long-run average queuing delay of the users, as a mean to minimize the average delay in the network. Although it is a quite standard scheduling, we provide in this paper a rigorous mathematical analysis, leading to a novel scheduling algorithm of which we prove optimality in the many users regime. In fact, we show in this paper that the considered scheduling problem can be cast as a restless bandit problem (RBP), which is a particular Markov decision process (MDP). However, RBPs are PSPACE-Hard (see Papadimitriou et al. [9]), and hence their optimal solution is out of reach. One should therefore propose sub-optimal policies when dealing with such problems. In this paper, we approach the considered RBP problem using the Lagrangian relaxation technique, which consists of relaxing the constraint on the available resources. In other words, instead of having the constraint on the number of available channels satisfied in every time slot, we consider that it has to be satisfied on average. This allows us to decompose the large relaxed optimization problem into much simpler one-dimensional problems. Based on the optimal solution of the individual relaxed problems, we develop a heuristic for the original (i.e., non-relaxed) optimization problem. This heuristic is known as the Whittle’s index policy (WIP) and we will show that for our particular model, an explicit expression of the Whittle’s index can be found. WIP has been proposed as a suboptimal policy for many problems in the literature, see for instance [10], [11]. It has also been shown to perform near optimally in many scenarios and in the particular case of multiclass M/M/1 queues, WIP which simplifies to the [TeX:] $$c \mu$$-rule is optimal, see Buyukkoc et al. [12], and Larranaga [13]. In this paper, we will prove that the developed WIP is asymptotically optimal in the many users regime. To that extent, we summarize in the following the key contributions of this paper:

· We provide an analysis of the relaxed optimization problem, which let us obtain the structure of the optimal solution of its dual problem. The optimal solution is shown to be a threshold-based policy by proving that the value function of the Bellman equation that resolves each individual dual problem satisfies the increasing property.

· We resolve the full balance equations verified by the stationary distribution of the Markov chain representing the evolution of the queue state under a general threshold policy n. This step is very crucial to derive the Whittle’s indices expressions.

· We reformulate the individual dual problem of the relaxed problem using the steady state distribution. Afterwards, we provide a general algorithm that allows us to obtain the Whittle’s index. To reduce even further the complexity, we provide a rigorous proof of the indexability of the classes, along with lemmas that allow us to derive simple expressions of the Whittle’s index.

· We provide further characterization of the thresholdbased optimal solution of the relaxed optimization problem. The structure of this solution helps us to prove the local asymptotic optimality of our proposed policy as we just need to compare the average cost under the Whittle’s index policy with the optimal cost of the relaxed problem. The reason behind that is the fact that the latter is always less than the optimal cost of the original problem.

· We show that the Whittle’s index policy is asymptotically optimal in the infinitely many users regime, that is, when the number of users in the system as well as the available channels grow large.

· Finally, we provide numerical performance results of the Whittle’s index policy that corroborate our claims.

A. Related Work

The problem of resource allocation and scheduling in wireless networks has been widely studied in the literature. In [2]–[6], throughput optimal schedulers have been derived for single channel, multi-channel and multi-user MIMO contexts. The aforementioned set of work focuses on developing strategies that stabilize the queues of the users using the max weight rule. The classical max weight rule is however known to be not delay optimal. To overcome this issue, many works have been developed in the past to take into account the average delay of the traffic of the users (e.g., see [14] and the references therein). Most of the existing works use MDP frameworks and develop allocation strategies using Bellman equation (e.g., by using value iteration, policy iteration, etc.). However, MDP frameworks and Bellman equation suffer from the curse of dimensionality, which leads to complex resource allocation strategies. In [15], [16], the authors try to minimize the average delay of the users’ queues using MDP and stochastic learning tools. The complexity of the developed solutions is however much higher than the Whittle’s index policy. Stochastic learning is also used in [17] to deal with the problem of power allocation in an OFDM (Orthgonal Frequency Division Multiplexing) system with the goal being to minimize the average delay of the users’ packets in the queues. The developed solution requires high memory and computational complexity as compared to the Whittle’s index policy.

On the other hands, [18]–[20] study a flow-level scheduling problem with time-variant Markovian channels. They propose well-performing policies, but they don’t tackle their optimality. For instance, in [18], the authors derive a well-performing policy called “potential improvement rule” in the case where there is no arrivals and provide results regarding its optimality without giving analytical proofs and under several conditions specifically, when only one user can be served.

Whittle’s index based policies have been used/developed in wireless networks to deal with the problem of pilot allocation over Markovian channel models. If a pilot is allocated to a user, its CSI can be estimated correctly and the user can hence transmit at a given rate. In [10], [21], a Gilbert-Elliot channel model is considered and the Whittle’s index is derived. It has been shown in [21], [22] that a policy based on Whittle’s index is asymptotically optimal for their specific problem. The authors in [23] extended the problem of pilot allocation to the case where the channel evolves according to a Markovian process between K states instead of two states as in the Gilbert- Elliot model. In the aforementioned papers, the queues of the users were not considered. In fact, the focus was on the channel allocation such that the long term total throughput (or equivalent objective function) is maximized without taking into account the dynamic traffic of the users. In this paper, the objective of the user/channel allocation is to minimize the long term average queuing delay of the users.

In [11], a derivation of the Whittle’s index values for a simple multiclass M/M/1 model has been considered (where only one user can be served). However, the optimality of the obtained Whittle’s index policy has not been proved in [11] and the time was assumed to be continuous in their model. The authors in [24] considered the problem of project/job scheduling in which an effort is allocated to a fixed number of projects. The performance of a Whittle’s index based policy was analyzed under a continuous time model. In contrast to these two papers, we consider that the time is slotted and that several users can be scheduled at a given time slot and not only one user. We provide an explicit characterization of the Whittle’s indices, develop a Whittle’s index channel allocation policy for our problem. On the other hand, in contrast to previous work in [1], in this paper, we consider that the buffer size is very tight. The motivation behind that is that in the framework of IoT (Internet of things), the resources of the connected devices are very limited, especially the energy available and the backlog queue. To that extent, we suppose that the queue length of a given node does not exceed a certain constant denoted by L, which is in its turn less than the departure rate. Besides that, unlike our work in [1], we further prove the asymptotic optimality of our developed policy in the many user’s regime.

The remainder of the paper is organized as follows: In Section II, we formulate the problem under investigation and we introduce the Lagrangian relaxation. In Section III, we prove the optimality of threshold/monotone policies for the relaxed problem. In Section IV, we compute the steady-state distribution of the system under a general threshold policy. In Section V, we characterize the Whittle’s indices explicitly and we lay out our proposed Whittle’s index based policy. Section VI provides further characterization of the optimal solution of the relaxed problem. In Sections VII and VIII, we prove the local and global asymptotic optimality of our proposed scheme respectively. In Section IX, we evaluate the performance of the Whittle’s index policy numerically. Lastly, the mathematical proofs are provided in the appendices.

II. SYSTEM MODEL AND PROBLEM FORMULATION

A. System Model Description

We consider a time-slotted system with one central scheduler, N users/queues and M uncorrelated channels (or servers) with (N > M). The terms “server” and “channel” will be used interchangeably throughout this paper, as well as the terms “user” and “queue”. A channel can be allocated to at most one user, hence only M users will be able to transmit (i.e., send packets) at time slot t. We consider K different classes of users and we assume that for each user i in class-k, the number of arrival packet denoted by [TeX:] $$A_i^k(t)$$ follows a uniform distribution in [TeX:] $$\left\{0, \cdots, R_k-1\right\}$$ at each time slot t. Moreover, we consider that the buffer size of each user in the system is very tight. Accordingly, for each user in any class, if scheduled, transmits all the packets in its buffer. We denote by [TeX:] $$\gamma_k$$ the proportion of class-k users in the system. We also let [TeX:] $$q_i^{k, \phi}(t)$$ denote the number of packets in queue i in class k. Furthermore, [TeX:] $$s_i^{k, \phi}\left(\boldsymbol{q}^\phi(t)\right)$$ will denote the transmission action under a decision policy [TeX:] $$\phi$$ for user i in class k and [TeX:] $$\boldsymbol{q}^\phi(t)$$ the vector of all queue lengths [TeX:] $$\left(q_1^{1, \phi}(t), \cdots, q_{N \gamma_1}^{1, \phi}(t), \cdots, q_1^{K, \phi}(t), \cdots, q_{N \gamma_K}^{K, \phi}(t)\right).$$ For the sake of clarity, we define [TeX:] $$s_i^{k, \phi}(t):=s_i^{k, \phi}\left(q^\phi(t)\right).$$ If policy [TeX:] $$\phi$$ prescribes to schedule user i in class k at time t, then [TeX:] $$s_i^{k, \phi}(t)=1,$$ and [TeX:] $$s_i^{k, \phi}(t)=0$$ otherwise. We denote by L the buffer capacity, which is considered to be the same for all queues and less than [TeX:] $$R_k$$ for all k. The general system model is presented in Fig. 1. Based on our system model, the number of packets in queue i of class k evolves as follows:

(1)
[TeX:] $$q_i^{k, \phi}(t+1)=\min \left\{\left(q_i^{k, \phi}(t)\left(1-s_i^{k, \phi}(t)\right)+A_i^k(t), L\right\},\right.$$

where [TeX:] $$(x)^{+}=\max \{x, 0\} .$$

Fig. 1.
System model.
B. Penalty Function Dynamics

In this paper, we are interested in minimizing the average delay incurred in the users’ queues of the system. To that end, we should provide the expression of the queuing delay in function of the queue length. Then, we give the average cost function that we should minimize.

1) Queuing Delay metric: Given a user i, and class k, the average delay of each packet in this queue during the period [0, T] is the delay’s sum of all arrived packets between 0 and T which is, [TeX:] $$\sum_{t=0}^T q_i^k(t)$$ over the average number of the arrived packet between 0 and T. If there is no constraint on the buffer size as considered in [1], the average number of arrived packets between the time 0 and T is [TeX:] $$T\left(R_k-1\right) / 2.$$ Therefore, in this case, the number of arrived packets per time unit, or equivalently, the packet arrival rate is constant over time. To that extent, in our paper, where we consider that the buffer size is bounded by L, in order to have a fixed rate of the packet arrival as in [1], we take into consideration all the arrived packets, even those which are dropped due to the buffer constraint to give a simple expression of the delay metric. Since the surplus packets are immediately dropped, then we associate for this type of packets a delay that equals to 0. However, since the packets are dropped, which is an undesired event, a penalty should be incurred in the cost function that we will see later in the next section. As a consequence, the average delay according to the Little’s Law is [TeX:] $$2 \sum_{t=0}^T q_i^k(t) / T\left(R_k-1\right).$$ Denoting [TeX:] $$\left(2 /\left(R_k-1\right)\right) \beta_k$$ by [TeX:] $$a_k$$ where [TeX:] $$\beta_k$$ is a weight factor, then the metric of interest is [TeX:] $$\limsup _{T \rightarrow \infty}(1 / T) \sum_{t=0}^T a_k q_i^k(t) / T \text {. }$$

2) Penalty function: As mentioned in the section above, a penalty should be incurred when the packets are dropped. More precisely, a penalty should be paid when the number of arrived packets plus those in the queue exceeds L. To that extent, we should find for which state of the queue, the queue overflow will occur. Indeed, the only state that we can presume that there is an overflow at time t is when the queue state is equal to L. Bearing that in mind, we define for a given user i in class k, [TeX:] $$b\left(q_i^k(t)\right)$$ that equals to 0 if [TeX:] $$q_i^k(t)\lt L$$ (zero penalty paid) and [TeX:] $$b\left(l_i^k(t)\right)=C_d\gt 0$$ (penalty is being paid). To that extent, we define the penalty function as follows:

(2)
[TeX:] $$d(q)=\left\{\begin{array}{ll} q & \text { if } 0 \leq q \leq L-1 \\ L+C_d & \text { if } q=L \end{array}\right. \text {. }$$

Consequently, the objective of the present work is to find a scheduling policy [TeX:] $$\phi$$ that minimizes the average weighted penalty function [TeX:] $$\limsup _{T \rightarrow \infty}(1 / T) \sum_{t=0}^T a_k d\left(q_i^k(t)\right) / T .$$

C. Problem Formulation

The cost incurred by user i in class k, at time t is equal to [TeX:] $$a_k d\left(q_i^{k, \phi}(t)\right)$$ for all [TeX:] $$i \in\left\{1, \cdots, \gamma_k N\right\}.$$ One can see that the model described in Section II belongs to the family of restless bandit problems (RBP). We consider the broad class [TeX:] $$\Phi$$ of scheduling policies in which a scheduling decision depends on the history of observed queue states and scheduling actions. Our user and channel allocation problem therefore consists of identifying the policy [TeX:] $$\phi \in \Phi$$ that minimizes the infinite horizon expected average cost functions of different users, subject to the constraint on the number of users selected at each time slot. Given the initial state [TeX:] $$\boldsymbol{q}(0)=\left(q_1^1(0), \cdots, q_{N \gamma_1}^1(0), \cdots, q_1^K(0), \cdots, q_{N \gamma_K}^K(0)\right),$$ the problem can be formulated as follows:

(3)
[TeX:] $$\begin{aligned} & \min _{\phi \in \Phi} \limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} a_k d\left(q_i^{k, \phi}(t)\right) \mid \boldsymbol{q}(0)\right], \\ & \text { s.t. } \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} s_i^{k, \phi}(t) \leq \alpha N, \text { for all } t, \end{aligned}$$

where [TeX:] $$\alpha=M / N$$ is the fraction of users that can be scheduled.

III. RELAXED PROBLEM AND THRESHOLD-BASED POLICY

As has been discussed in the introduction of this paper, RBPs are PSPACE-Hard (see Papadimitriou et al. [9]) and therefore one should develop well performing sub-optimal policies to solve these problems. In this paper, the development of our policy is done through several steps. First, we consider a Lagrangian relaxation of our problem and show that it can be decomposed into several one-dimensional problems. We then prove that the optimal solution to each of these relaxed problems is a threshold-based policy. We then compute the stationary distribution of the states of the system under the aforementioned threshold policy. This allows us to obtain a closed form expression of the Whittle’s index values of the relaxed problem and develop a Whittle index-based scheduling policy for the original RBP.

In this section, we first formulate the relaxed problem and prove that its optimal policy is a threshold-based one.

A. Relaxed Problem and Dual Problem

The Lagrangian relaxation consists of relaxing the constraint on the available resources. Namely, we consider that the constraint in (3), has to be satisfied on average and not in every decision epoch, that is,

(4)
[TeX:] $$\limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} s_i^{k, \phi}(t)\right] \leq \alpha N .$$

Note that, contrary to the strict constraint in (3), the relaxed constraint allows the activation of more than α fraction of users at each time slot. If we note W the Lagrangian multiplier for the constrained problem, then the Lagrange function equals to:

(5)
[TeX:] $$\begin{aligned} & f(W, \phi) \\ & =\limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left(a_k d\left(q_i^{k, \phi}(t)\right)+W s_i^{k, \phi}(t)\right) \mid \mathbf{q}_0\right] \\ & \quad-W \alpha N, \end{aligned}$$

where W can be seen as a subsidy for not transmitting. Therefore, the dual problem for a given W is

(6)
[TeX:] $$\min _{\phi \in \Phi} f(W, \phi) .$$

B. Problem Decomposition and Threshold-based Policy

In this section, we show that the relaxed problem can be decomposed into N one-dimensional subproblems, for which the optimal solution is a threshold-based policy. To do that, we first get rid of the constants that do not depend on [TeX:] $$\phi$$ and reformulate the problem as follows,

(7)
[TeX:] $$\min _{\phi \in \Phi} \limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left(a_k d\left(q_i^{k, \phi}(t)\right)+W s_i^{k, \phi}(t)\right) \mid \mathbf{q}_0\right].$$

One can see that the solution of this problem can be deduced from the well known Bellman equation (see Ross [25]). More specifically:

(8)
[TeX:] $$\overline{\boldsymbol{V}}(\mathbf{q})+\theta=\min _{\mathbf{s}}\left\{\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} C_k\left(q_i^k, s_i^k\right)+\sum_{\mathbf{q}^{\prime}} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}\right) \overline{\boldsymbol{V}}\left(\mathbf{q}^{\prime}\right)\right\},$$

for all [TeX:] $$\boldsymbol q=\left(q_1^1, \cdots, q_{\gamma_1 N}^1, \cdots, q_1^K, \cdots, q_{\gamma_k N}^K\right),$$ with [TeX:] $$q_i^k \in \{0, \cdots, L\}$$ being the queue length of class-k user i, and [TeX:] $$\boldsymbol s=\left(s_1^1, \cdots, s_{\gamma_1 N}^1, \cdots, s_1^K, \cdots, s_{\gamma_k N}^K\right),$$ with [TeX:] $$s_i^k \in\{0,1\}$$ being the action taken with respect to user i in class k. In equation (8), [TeX:] $$\overline{\boldsymbol{V}}(\cdot)$$ represents the Value Function, θ is the optimal average cost and [TeX:] $$C_k\left(q_i^k, s_i^k\right)$$ is the holding cost [TeX:] $$a_k d\left(q_i^k\right)+W s_i^k.$$ The optimal decision for each state [TeX:] $$\boldsymbol{q}$$ can be obtained by minimizing the right hand side of (8). We now show that the problem can be decomposed into N independent subproblems by decomposing [TeX:] $$\overline{\boldsymbol{V}}(\cdot)$$ into separate Value Functions for each user i in class k, i.e., [TeX:] $$V_i^k(\cdot) .$$ In other words, the optimal decision [TeX:] $$\boldsymbol{s}$$ to (8) is a vector composed of elements [TeX:] $$s_i^k$$, where each [TeX:] $$s_i^k$$ is nothing but the optimal decision that solves the individual Bellman equations.

(9)
[TeX:] $$V_i^k\left(q_i^k\right)+\theta_i^k=\min _{s_i^k}\left\{C_k\left(q_i^k, s_i^k\right)+\sum_{q_i^{\prime k}} \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right)\right\} .$$

Proposition 1. Let [TeX:] $$V_i^k(\cdot)$$ be the optimal value function that solves (9), and let [TeX:] $$\overline{\boldsymbol{V}}(\cdot)$$ be the optimal value function that solves (8) then:

(10)
[TeX:] $$\overline{\boldsymbol{V}}(\cdot)=\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k(\cdot).$$

Proof. See appendix A.

In this section, we show that the solution to each individual problem (for each user i) follows the structure of a threshold policy. For ease of notation, we drop the indices k and i and consider that [TeX:] $$V(\cdot)$$ is the value function for a given user. We first provide the definition of threshold policies.

Definition 1. A threshold policy is a policy [TeX:] $$\phi \in \Phi$$ for which there exists [TeX:] $$n \in\{-1,0, \cdots, L\}$$ such that when the queue of user i is in state [TeX:] $$q \leq n,$$ the prescribed action is [TeX:] $$s^{-} \in\{0,1\},$$ and when [TeX:] $$q \gt n,$$ the prescribed action is [TeX:] $$s^{+} \in\{0,1\}$$ while bearing in mind that [TeX:] $$s^{-} \neq s^{+} .$$

Since we only have two possible actions, a policy is of the form threshold policy if and only if it is monotone with q.

The solution of the Bellman equation (9) [TeX:] $$V(\cdot)$$ can be obtained by the well known value iteration algorithm, which consists of updating [TeX:] $$V_t(\cdot)$$ using the following equation:

(11)
[TeX:] $$V_{t+1}(q)=\min _s\left\{C(q, s)+\sum_{q^{\prime}} \operatorname{Pr}\left(q^{\prime} \mid q, s\right) V_t\left(q^{\prime}\right)\right\}-\theta .$$

We consider that the initial value function [TeX:] $$V_0$$ is equal to 0 for any q, (i.e., for all q, [TeX:] $$V_0(q)=0$$). After many iterations, [TeX:] $$V_t(\cdot)$$ will converge to the unique fixed point of the (9) called [TeX:] $$V(\cdot)$$ (see [13, Chapter 1.3.3]). However, the value iteration algorithm is known to have high complexity and can take a long time to converge. Therefore, we will give some structural properties of the value function [TeX:] $$V_t(\cdot)$$ for any t and conclude that the optimal policy is a threshold-based one.

Remark 1. It is worth to emphasize that if the arrival packets plus the current queue length overflow on the buffer capacity, the user retain only the L packets and get rid of the surplus of the packets. Subsequently, from the state q, we can reach the queue length L, when the number of arrival packets A can be either L - q or plus. Having said that, [TeX:] $$\operatorname{Pr}(L \mid q, 1)=\sum_{j=L}^{R-1} \operatorname{Pr}(A=j)$$ and [TeX:] $$\operatorname{Pr}(L \mid q, 0)=\sum_{j=L-q}^{R-1} \operatorname{Pr}(A=j).$$

To establish our desired result, we proceed with these following steps:

· We prove that [TeX:] $$V(\cdot)$$ is increasing with q.

· We establish that [TeX:] $$V^1(q)-V^0(q)$$ is decreasing with q where [TeX:] $$V^0(q)$$ and [TeX:] $$V^1(q)$$ are the value functions when the action prescribed at state q is s = 0 and s = 1, respectively.

· Finally, we show that the optimal solution is an increasing threshold policy.

Regarding the first point, we show that [TeX:] $$V_t(\cdot)$$ is increasing with q for all t by induction. Precisely, we establish that:

· [TeX:] $$V_0(\cdot)$$ increases with q.

· If [TeX:] $$V_t(\cdot)$$ is increasing with q, [TeX:] $$V_{t+1}(\cdot)$$ is also increasing with q.

Given that [TeX:] $$V_0(\cdot)=0,$$ then [TeX:] $$V_0(\cdot)$$ is increasing with q.

Considering [TeX:] $$V_t(\cdot)$$ is increasing with q, then [TeX:] $$\sum_{q^{\prime}} \operatorname{Pr}\left(q^{\prime} \mid \cdot, s\right) V_t\left(q^{\prime}\right)$$ grows with q (see Puterman [26]). We have by construction, [TeX:] $$C(\cdot, s)$$ increases with q. Since θ is just a constant, [TeX:] $$V_{t+1}(\cdot)$$ will be as well an increasing function with q.

As a consequence, we show that [TeX:] $$V_t(\cdot)$$ is increasing with q for all t. Leveraging the fact that [TeX:] $$V(\cdot)$$ is the limit of [TeX:] $$V_t(\cdot)$$ when t grows, then [TeX:] $$V_t(\cdot)$$ when t grows, then [TeX:] $$V(\cdot)$$ is also increasing with q.

As for the second point, one can see that the next state before the arrival of the packets will be q = 0 if the action prescribed is the active action since all the packets in the buffer will be transmitted. Consequently, the probability to transit to the state [TeX:] $$q^{\prime}$$ from a given state q under the active action is the probability to have [TeX:] $$A=q^{\prime}$$ if [TeX:] $$q^{\prime} \lt L.$$ or [TeX:] $$L \leq A \leq R-1$$ if [TeX:] $$q^{\prime}=L$$ (according to the remark 1). Hence [TeX:] $$\operatorname{Pr}\left(q^{\prime} \mid q, 1\right)$$ doesn’t depend on q. Likewise for [TeX:] $$\sum_{q^{\prime}=0}^L \operatorname{Pr}\left(q^{\prime} \mid q, 1\right) V\left(q^{\prime}\right).$$

We have that:

[TeX:] $$\begin{aligned} & V^1(q)-V^0(q) \\ & \quad=W+\sum_{q^{\prime}} \operatorname{Pr}\left(q^{\prime} \mid q, 1\right) V\left(q^{\prime}\right)-\sum_{q^{\prime}} \operatorname{Pr}\left(q^{\prime} \mid q, 0\right) V\left(q^{\prime}\right) . \end{aligned}$$

Bearing in mind that [TeX:] $$V(\cdot)$$ is increasing with q, then again according to Puterman [26], [TeX:] $$\sum_{q^{\prime}} \operatorname{Pr}\left(q^{\prime} \mid q, 0\right) V\left(q^{\prime}\right)$$ is increasing with q. Leveraging the above result, [TeX:] $$\sum_{q^{\prime}=0}^L \operatorname{Pr}\left(q^{\prime} \mid q, 1\right) V\left(q^{\prime}\right)$$ is constant with respect to q. Consequently, [TeX:] $$V^1(q)-V^0(q)$$ decreases with q.

To prove the last point, we recall that the optimal action s(q) at state q according to (8), is the one that minimizes [TeX:] $$V^s(q).$$ Explicitly, [TeX:] $$s(q)=\operatorname{argmin}\left\{V^0(q), V^1(q)\right\} .$$ Moreover, exploiting the fact that [TeX:] $$V^1(q)-V^0(q)$$ is decreasing with q, then there exists [TeX:] $$q_0 \in\{-1, \cdots, L\}$$ such that for all [TeX:] $$q \leq q_0, V^1(q) \geq V^0(q)$$ and for all [TeX:] $$q \gt q_0, V^1(q) \leq V^0(q).$$ Consequently, we deduce that for all [TeX:] $$q \leq q_0,$$ the optimal decision is to stay idle, and for all [TeX:] $$q \gt q_0,$$ the optimal decision is to transmit. Thereby, we prove that the optimal solution of (7) is of type threshold increasing policy.

IV. STATIONARY DISTRIBUTION

We have seen previously that the optimal solution of (7) is a threshold-based policy. Let us define [TeX:] $$n_k$$ as the threshold for users in class k, i.e., if the queue state of user i in class k is [TeX:] $$q_i^k$$ such that [TeX:] $$q_i^k \leq n_k$$ then the user will not be scheduled, and else, the user will be selected for transmission. The objective of this section is to derive the stationary distribution of the users’ states. This will be useful in the subsequent section in the derivation of a closed form expression of the Whittle’s index values. We assume here that at each queue i in class k, packets arrive according to a discrete uniform distribution, that is, [TeX:] $$\mathbb{P}\left(A_i^k(t)=x\right)=\rho_k$$ for all [TeX:] $$0 \leq x \leq R_k-1 \text { and } 0$$ otherwise, where [TeX:] $$\rho_k=1 / R_k.$$

For ease of notation, we again drop the indices k and i (e.g., we denote the threshold by n and the queue state by q). To that extent, we denote by [TeX:] $$p_n(i, j)$$ the transition probability from queue state i to queue state j, by u(.) the stationary distribution under the threshold policy n, and by R-1 the maximum arrival rate [TeX:] $$(\rho=1/R)$$. Finding the stationary distribution requires resolving the full balance equation:

(12)
[TeX:] $$u(i)=\sum_{j=0}^n p_n(j, i) u(j).$$

In most of works in literature, the authors consider that the evolution of the metric under the passive action is deterministic [21], [23], [27], [28]. In other words, for each state i, we know for sure the next state to which the bandit will transit under the passive action (usually i + 1). Explicitly, [TeX:] $$p_n(j, j+1)=1$$ if j ≤ n. That explains why in these papers above, the authors got a simple recurrence relation between the elements of the stationary distribution under a given threshold policy. For instance, in [27], leveraging only the evolution of AoI (Age of information metric), the authors got directly u(i + 1) in function of u(i) without requiring any further manipulations. Whereas in this paper, obtaining a recurrence relation is not straightforward, since the expression of u(i) is a linear combination of [TeX:] $$\{u(j)\}_{j \in A_i}$$ where the cardinal of the [TeX:] $$A_i$$ is at least L - n as we will see later.

u(.) verifies the following full balance equation:

(13)
[TeX:] $$u(i)=\sum_{j=0}^L p_n(j, i) u(j)=\sum_{j=0}^n p_n(j, i) u(j)+\sum_{j=n+1}^L p_n(j, i) u(j).$$

Definition 2. We define [TeX:] $$\pi_i$$ as:

(14)
[TeX:] $$\pi_i=\left\{\begin{array}{ll} \rho & \text { if } 0 \leq i \leq R-1 \\ 0 & \text { else. } \end{array} .\right.$$

Proposition 2. The expressions of [TeX:] $$p_n(j, i)$$ are given by: If [TeX:] $$0 \leq i\lt L \text { and } j \leq n$$

(15)
[TeX:] $$p_n(j, i)=\pi_{i-j}=\left\{\begin{array}{ll} \rho & \text { if } 0 \leq i-j \leq R-1 \\ 0 & \text { else } \end{array}\right. \text {, }$$

If [TeX:] $$0 \leq i\lt L \text { and } n \lt j \leq L$$

(16)
[TeX:] $$p_n(j, i)=\pi_i=\left\{\begin{array}{ll} \rho & \text { if } 0 \leq i \leq R-1 \\ 0 & \text { else } \end{array}\right. \text {, }$$

If [TeX:] $$i=L \text { and } j \leq n$$

(17)
[TeX:] $$p_n(j, L)=(R-L+j) \pi_{L-j}=(R-L+j) \rho \text {. }$$

If [TeX:] $$i=L \text { and } n \lt j \leq L$$

(18)
[TeX:] $$p_n(j, L)=(R-L) \pi_L=(R-L) \rho \text {. }$$

Proof. See appendix B.

Proposition 3. The expressions of the stationary distribution is:

1) [TeX:] $$-1 \leq n \leq L-1$$:

(19)
[TeX:] $$\begin{aligned} & u(i) \\ & =\left\{\begin{array}{ll} \rho(1-\rho)^{n-i} & \text { if } 0 \leq i \leq n \\ \rho & \text { if } n+1 \leq i \leq L-1 \\ (1-\rho)^{n+1}-(L-n-1) \rho & \text { if } i=L \end{array} .\right. \\ & \end{aligned}$$

2) n = L:

(20)
[TeX:] $$u(i)= \begin{cases}0 & \text { if } 0 \leq i \leq L-1 \\ 1 & \text { if } i=L\end{cases}.$$

Proof. See appendix C.

V. WHITTLE’S INDEX

In this section, we provide the derivation of the Whittle’s indices, which are values that depend on the queue state of the user. Although this derivation is made using the relaxed problem, it allows us to develop a heuristic for the original problem. It is worth mentioning that the Whittle’s index at a given state, say n, represents the Lagrange multiplier for which the optimal decision of the individual dual relaxed problem at this state is indifferent (passive and active decision are both optimal). However, the Whittle’s index is well defined only if the property of indexability is satisfied. This property requires to establish that as the Lagrange multiplier (or equivalently the subsidy for passivity W) increases, the collection of states in which the optimal action is passive increases. In this section, we work on a given class k, and we consider its maximum arrival rate is R - 1 (a = 2=(R - 1)) with ρ = 1/R. All the obtained results here can be applied for any class.

We start the derivation by first reformulating the dual of the relaxed problem using the stationary distribution derived in the previous section. Since the solution of the dual of the relaxed problem (7) (given a constant W) is a threshold-based policy, we can reformulate the problem as follows:

(21)
[TeX:] $$\begin{aligned} & \min _{n \in[0, L]} \mathbb{E}\left[a d\left(q^n\right)+W s^n\right] \\ & \quad=\min _{n \in[0, L]}\left\{\sum_{i=0}^L a u^n(i) d(i)-W \sum_{q=0}^n u^n(i)+W\right\} \end{aligned}$$

with [TeX:] $$n \text { and } u^n$$ being the threshold and the stationary distribution under the threshold policy n with respect to the queue length respectively.

The new formulation of the problem turns out to be useful to derive the Whittle’s indices since, for any W, we can find the minimizer of the expression in (21).

We first give the expression of the mean cost in (21) given threshold policy n (for all possible values of n and L).

If [TeX:] $$-1 \leq n \leq L-1$$:

(22)
[TeX:] $$\begin{aligned} \sum_{i=0}^L a u^n(i) d(i)= & a\left[\left(L+R+C_d\right)(1-\rho)^{n+1}+n-R+1\right. \\ & \left.-(L-n-1) \rho C_d+\frac{(L-1-n)(n-L)}{2 R}\right] . \end{aligned}$$

If n = L:

(23)
[TeX:] $$\sum_{i=0}^L a u^n(i) d(i)=a L+a C_d.$$

Second, we provide the expression of the passive decision’s average time in (21) given a threshold n:

If [TeX:] $$-1 \leq n \leq L-1$$:

(24)
[TeX:] $$\sum_{i=0}^n u^n(i)=1-(1-\rho)^{n+1}.$$

If n = L:

(25)
[TeX:] $$\sum_{i=0}^n u^n(i)=1.$$

A. Computation of the Whittle’s Index Values

We first formalize the indexability and the Whittle’s index in the following definitions.

Definition 3. Considering (21) for a given W, we define D(W) as the set of states in which the optimal action (with respect to the optimal solution of (21)) is the passive one. In other words, n ∈ D(W) if and only if the optimal action at state n is the passive one.

D(W) is well defined as the optimal solution of (21) is a stationary policy, more precisely, a threshold based policy.

Definition 4. A class is indexable if the set of states in which the passive action is the optimal action increases with W, that is, [TeX:] $$W^{\prime}\lt W \Rightarrow D\left(W^{\prime}\right) \subseteq D(W) .$$ When the class is indexable, the Whittle’s index in state n is defined as:

(26)
[TeX:] $$W(n)=\min \{W \mid n \in D(W)\}$$

In the literature, several works have been conducted to find the Whittle’s index values. For example, an interesting iterative algorithm has been provided in [13]. Even though the context of our work here is different from the one considered in [13], we will prove in the sequel that the proposed algorithm in [13] can be adapted to our case up to some modifications (in our case we have a maximum buffer state L). In addition, further analysis will be provided here to derive a closed form expression of the Whittle’s index values. We will first provide this modified algorithm and then prove that it allows the computation of the Whittle’s index values for our problem.

Proposition 4. Assuming that the optimal solution is a threshold policy, and that [TeX:] $$\sum_{q=0}^n u^n(q)$$ is increasing, then the class is indexable. Moreover, if [TeX:] $$\sum_{q=0}^L a u^n(q) d(q)$$ is Pncreasing with n and for all i and j such that i < j [TeX:] $$\sum_{q=0}^i u^i(q)=\sum_{q=0}^j u^j(q) \Longrightarrow \sum_{q=0}^L a u^i(q) d(q) \lt \sum_{q=0}^L a u^j(q) d(q),$$ then the Whittle’s index values are computed by applying Algorithm 1.

Whittle index computation

Proof. For the proof, see appendix D.

Remark 2. In order to simplify the notation in the sequel, we denote [TeX:] $$\sum_{q=0}^L a u^n(q) d(q) \text { by } a_n \text { and } \sum_{q=0}^n u^n(q) \text { by } b_n \text {. }$$

In order to apply Algorithm 1 that allows to obtain the Whittle’s index for each state in our case, we need to prove that the conditions given in Proposition 4 are satisfied.

We start first by establishing the indexability.

Theorem 1. For each k, the class-k is indexable.

Proof. According to Proposition 4, we just need to prove that [TeX:] $$\sum_{q=0}^n u^n(q)$$ is increasing with n. It is clear that from (24), [TeX:] $$\sum_{q=0}^n u^n(q)$$ is increasing with n. Hence, the class is indexable.

We prove the two others conditions of Proposition 4 which are the increase property of [TeX:] $$\sum_{q=0}^L a u^n(q) d(q)$$ with n, and that for all i and j such that i \lt j, [TeX:] $$\sum_{q=0}^i u^i(q)=\sum_{q=0}^j u^j(q) \Longrightarrow$$ [TeX:] $$\sum_{q=0}^L a u^i(q) d(q)\lt \sum_{q=0}^L a u^j(q) d(q).$$ The second property for this case is meaningless since [TeX:] $$\sum_{q=0}^i u^i(q)$$ is strictly increasing with i. While for the first one, one should demonstrate that [TeX:] $$a_n$$ is increasing with n.

Proposition 5. [TeX:] $$a_n=\sum_{q=0}^L a u^n(q) d(q)$$ is increasing with n.

Proof. See appendix G.

As the indexability is satisfied and the two conditions of Proposition 4 are verified, then we can apply Algorithm 1 to get the Whittle’s index for each state. However, the complexity of this algorithm is L2. In order to overcome this complexity issue, we will provide further analysis and derive simple expressions of the Whittle’s indices.

We first proceed by laying out the following definitions and lemmas.

Definition 5. For any given increasing threshold policy n, we Pdefine [TeX:] $$y^n$$ as a function of the subsidy W, such that [TeX:] $$y^n(W)=\sum_{q=0}^L a u^n(q) d(q)-W \sum_{q=0}^n u^n(q)=a_n-W b_n .$$

Lemma 1. The intersection point [TeX:] $$W=x_{i, j} \text { between } y^i(W) \text{ and } y^j(W)$$ is equal to:

(27)
[TeX:] $$x_{i, j}=\frac{\sum_{q=0}^L a u^i(q) d(q)-\sum_{q=0}^L a u^j(q) d(q)}{\sum_{q=0}^i u^i(q)-\sum_{q=0}^j u^j(q)} .$$

Proof. See Appendix H.

Theorem 2. The Whittle’s index of state n ∈ [0, L]:

[TeX:] $$\begin{aligned} W(n) & =x_{n, n-1} \\ & =\frac{a\left[\rho(L-n)-\rho\left(L+R+C_d\right)(1-\rho)^n+1+\rho C_d\right]}{\rho(1-\rho)^n}. \end{aligned}$$

Proof. See appendix I.

B. Whittle’s Index Policy for the Original Problem

We now consider the original optimization (3) and propose a simple Whittle’s index policy. This policy consists of simply allocating the channels to the M users with the highest Whittle’s indices at time t, denoted by WIP, and computed using the simple expressions in Theorem 2.

Fig. 2.
[TeX:] $$y^n(W)$$ in function of W for different values of n (L = 4, R-1 = 7, and [TeX:] $$C_d=3$$): W(i) indicates the Whittle’s index at state i.

VI. FURTHER ANALYSIS OF THE OPTIMAL SOLUTION OF THE RELAXED PROBLEM

In this section, we provide further analysis and give the structure of the optimal solution for the relaxed problem, which will be useful for the proof of optimality of the Whittle’s index policy. As we have seen in Section III, for any given W, the optimal solution for the dual relaxed (7) is a thresholdbased policy for each user. By using the Whittle’s index expressions derived in Theorem 2, we provide a derivation of the optimal threshold for each class as a function of the Lagrange parameter W. In this section, we denote by [TeX:] $$W_i^k$$ the Whittle’s index at state i in class k. We denote by [TeX:] $$l=\left(l_1, l_2, \cdots, l_K\right)$$ the vector which represents the set of thresholds for each class k. We denote by [TeX:] $$u_k^n,$$ the stationary distribution for class k under threshold policy n.

Proposition 6. For a given W, the optimal threshold vector [TeX:] $$l=\left(l_1(W), l_2(W), \cdots, l_K(W)\right)$$ for the dual problem satisfies:

For each k:

(28)
[TeX:] $$l_k(W)=\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right\},$$

or

(29)
[TeX:] $$l_k(W)=\arg \max _i\left\{W_i^k \mid W_i^k\lt W\right\} .$$

We note that the solution can also be a linear combination between the threshold policies [TeX:] $$\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right\}$$ and [TeX:] $$\arg \max _i\left\{W_i^k \mid W_i^k \lt W\right\}$$

Proof. See appendix J.

Now, we give the structure of the optimal solution of the constrained relaxed problem.

Proposition 7. The solution of the constrained relaxed problem is of type threshold policy [TeX:] $$l\left(W^*\right)$$, with l being the function vector defined in Proposition 6 and [TeX:] $$W^*$$ satisfies [TeX:] $$\alpha=\sum_{k=1}^K \gamma_k \sum_{i=l_k\left(W^*\right)+1}^L u_k^{l_k\left(W^*\right)}(i)$$

Proof. See appendix K.

However, [TeX:] $$W^*$$ that satisfies the above constraint may not exist since α is a real number that can take any value in [0, 1], and [TeX:] $$\sum_{k=1}^K \gamma_k \sum_{i=l_{k+1}(W)}^L u_k^{l_k(W)}(i)$$ is discrete, since the vector l(W) can only take discrete values in [TeX:] $$[0, L]^K .$$ To deal with this issue, we use the fact that for some values of W, the optimal solution of the dual problem can be a linear combination or more precisely a randomized policy between two threshold policies for a given class as has been mentioned in Proposition 6. To that extent, our task is to find among these values of W, the one for which there exists a randomized parameter θ such that the constraint is satisfied with equality. To that end, we introduce this following proposition.

Proposition 8. There exists a class m, state p, and a randomization parameter [TeX:] $$\theta^*$$ such that the optimal solution of the dual problem when the langrangian parameter [TeX:] $$W=W_p^m=W^*$$ is characterized by:

· For [TeX:] $$k \neq m \text {, }$$ the optimal threshold is [TeX:] $$l_k\left(W_p^m\right)=\arg \max _i\left\{W_i^k \mid W_i^k \leq W_p^m\right\}$$

· For k = m, the optimal solution is randomized policy between two threshold policies [TeX:] $$l_m\left(W_p^m\right)=\arg \max _i\left\{W_i^m \mid W_i^m \leq W_p^m\right\}$$ and [TeX:] $$l_m\left(W_p^m\right)-1=\arg \max _i\left\{W_i^m \mid W_i^m\lt W_p^m\right\},$$ where the factor of randomization [TeX:] $$\theta^*$$ is the probability of adopting the policy [TeX:] $$l_m\left(W_p^m\right) \text { and } 1-\theta^*,$$ the probability of adopting the policy [TeX:] $$l_m\left(W_p^m\right)-1.$$

· The constraint (4) is satisfied with equality, i.e.,

(30)
[TeX:] $$\begin{aligned} \alpha= & \sum_{k \neq m} \sum_{i=l_k\left(W_p^m\right)+1}^L \gamma_k u_k^{l_k\left(W_p^m\right)}(i) \\ & +\sum_{i=l_m\left(W_p^m\right)+1}^L \gamma_m u_m^*(i) \\ & +\left(1-\theta^*\right) \gamma_m u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right), \end{aligned}$$

where [TeX:] $$u_m^*=\theta^* u^{l_m\left(W_p^m\right)}+\left(1-\theta^*\right) u^{l_m\left(W_p^m\right)-1}.$$

Proof. See appendix L.

The solution of the dual problem described in Proposition 8 satisfies the constraint (4) with equality, then according to Proposition 7, this solution is indeed the optimal solution of the constrained problem. In that regard, the optimal cost of the relaxed problem [TeX:] $$C^{{RP},N}$$, is expressed as following:

(31)
[TeX:] $$\begin{aligned} C^{R P, N}= & \sum_{k \neq m} \sum_{i=0}^L N \gamma_k a_k u_k^{l_k\left(W_p^m\right)}(i) d(i) \\ & +\sum_{i=0}^L N \gamma_m a_m u_m^*(i) d(i). \end{aligned}$$

VII. LOCAL OPTIMALITY

In this section, we will show that the performance of the Whittle’s index policy is asymptotically locally optimal. The asymptotic optimality means that for a large number of users N and a large number of channels M (α = M=N is a constant value), the Whittle’s index policy is optimal. For that we will compare the average cost obtained by the Whittle’s index policy WIP with the one obtained for the relaxed problem RP. Explicitly, denoting by [TeX:] $$C_T^N(\mathbf{x})$$ the average cost obtained over the time duration [TeX:] $$0 \leq t \leq T$$ under Whittle’s index policy conditioned on the initial state [TeX:] $$\mathbf{x}$$, we show that [TeX:] $$C_T^N(\mathbf{x})$$ tends to [TeX:] $$C^{{RP},N}$$ when N and T scale. The reason behind comparing [TeX:] $$C^{{RP},N}$$ and [TeX:] $$C_T^N(\mathbf{x})$$ is that [TeX:] $$C^{{RP},N}$$ is a lower bound of all expected average cost obtained by any policy that resolves the original problem (3). This means that it is sufficient to prove that [TeX:] $$C_T^N(\mathbf{x})$$ converges to [TeX:] $$C^{{RP},N}$$ when T and N scale in order to establish the asymptotic optimality of Whittle’s index policy. For that, we will be in need of the optimal cost expression of the relaxed problem [TeX:] $$C^{{RP},N}$$ derived in Section VI.

First, we denote by [TeX:] $$Z_i^{k, N}$$ the proportion of queues at state i in class k over all the queues of the system. In other words, it denotes the number of queues at state i in class k over the number of all users which is N. We have that [TeX:] $$\mathbf{Z}^N=\left(\mathbf{Z}^{1, N}, \cdots, \mathbf{Z}^{K, N}\right)$$ with [TeX:] $$\mathbf{Z}^{k, N}=\left(Z_1^{k, N}, \cdots, Z_L^{k, N}\right)$$ and [TeX:] $$\sum_{i=0}^L Z_i^{k, N}=\gamma_k$$ for each class k.

The expression of [TeX:] $$C_T^N(\mathbf{x})$$ in function of [TeX:] $$\mathbf{Z}^N$$ is [TeX:] $$1 / T \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^L a_k Z_i^{k, N}(t) d(i) N \mid \mathbf{Z}^N(0)=\mathbf{x}\right],$$ where [TeX:] $$\mathbf{Z}^N(t)$$ evolves under Whittle’s index policy. Denoting by [TeX:] $$\mathbf{z}^*$$ the optimal proportion of the the relaxed problem, we say that the Whittle’s index policy is asymptotically locally optimal if there exists δ > 0 such that the initial proportion vector [TeX:] $$\mathbf{Z}^N(0)$$ is within [TeX:] $$\left.\Omega_\delta\left(\mathbf{z}^*\right) \text { (i.e., }\left\|\mathbf{Z}^N(0)-\mathbf{z}^*\right\|\lt \delta\right),$$ then [TeX:] $$C_T^N(\mathbf{x})$$ converges to [TeX:] $$C^{{RP},N}$$ when T and N scale.

In order to prove that, we use the fluid limit technique that consists of analyzing the evolution of the expectation of [TeX:] $$\mathbf{Z}^N(t)$$ under the Whittle’s index policy. For that, we define the vector [TeX:] $$\mathbf{z}(t)$$ as follows:

(32)
[TeX:] $$\begin{aligned} & \boldsymbol{z}(t+1)-\left.\boldsymbol{z}(t)\right|_{\boldsymbol{z}(t)=\boldsymbol{z}} \\ & \quad=\mathbb{E}\left[\boldsymbol{Z}^N(t+1)-\boldsymbol{Z}^N(t) \mid \boldsymbol{Z}^N(t)=\boldsymbol{z}\right] . \end{aligned}$$

If we denote by [TeX:] $$w_j^h$$ the Whittle’s index for class h at state j and by [TeX:] $$p_i^k(\boldsymbol{z})$$ the probability that a user is selected randomly among [TeX:] $$z_i^k$$ to transmit, one can easily show that [24]:

(33)
[TeX:] $$p_i^k(\boldsymbol{z})=\min \left\{z_i^k, \max \left(0, \alpha-\sum_{w_j^h\gt w_i^k} z_j^h\right)\right\} / z_i^k .$$

We denote by [TeX:] $$q_{i, j}^{k, 0} \text { and } q_{i, j}^{k, 1}$$ the probabilities of transition from state i to state j in a class-k if the queue is not scheduled and if the queue is scheduled for transmission respectively.

Then, the probability of transition from state i to state j in class k is:

(34)
[TeX:] $$q_{i, j}^k(\boldsymbol{z})=p_i^k(\boldsymbol{z}) q_{i, j}^{k, 1}+\left(1-p_i^k(\boldsymbol{z})\right) q_{i, j}^{k, 0} .$$

Accordingly, we have that for each i and k:

(35)
[TeX:] $$z_i^k(t+1)-z_i^k(t)=\sum_{j \neq i} q_{j, i}^k(\boldsymbol{z}(t)) z_j^k(t)-\sum_{i \neq j} q_{i, j}^k(\boldsymbol{z}(t)) z_i^k(t).$$

Let [TeX:] $$w^*$$ be the Lagrangian parameter that gives the optimal solution of the relaxed problem. Then, according to Proposition 8, there exists a given class m such that [TeX:] $$w_{l_m}^m=w^*$$ for which the corresponding optimal solution of the relaxed problem is of type threshold policy for class [TeX:] $$k \neq m$$ denoted by [TeX:] $$l_k,$$ and a randomized policy between two threshold policies [TeX:] $$l_m$$ and [TeX:] $$l_m-1$$ for class m.

We define [TeX:] $$\jmath_{w^*}$$ as the set of states such that at any system state [TeX:] $$\boldsymbol{z} \in \jmath_{w^*},$$ if we use the Whittle’s index policy, all users with the Whittle’s index value higher than [TeX:] $$w^*$$ are scheduled, while the users with Whittle’s index value smaller than [TeX:] $$w^*$$ stay idle and the users with Whittle’s index value [TeX:] $$w^*$$ are scheduled with a certain randomization. Specifically, [TeX:] $$\jmath_{w^*}=\left\{\boldsymbol{z}: \sum_{w_i^k\gt w^*} z_i^k\lt \alpha, \sum_{w_i^k \geq w^*} z_i^k \geq \alpha\right\}.$$

Providing that for all k and t:

(36)
[TeX:] $$\sum_{j=0}^L z_j^k(t)=\gamma_k.$$

Therefore, the following equation always holds for [TeX:] $$\boldsymbol{z}(t) \in \jmath w^* \text { : }$$

1) [TeX:] $$k \neq m$$:

(37)
[TeX:] $$\begin{aligned} & z_i^k(t+1) \\ & =\sum_{j=0}^{l_k-1}\left(q_{j, i}^{k, 0}-q_{l_k, i}^{k, 0}\right) z_j^k(t)+\sum_{j=l_k+1}^L\left(q_{j, i}^{k, 1}-q_{l_k, i}^{k, 0}\right) z_j^k(t) \\ & \quad+\gamma_k q_{l_k, i}^{k, 0} . \end{aligned}$$

2) k = m:

(38)
[TeX:] $$\begin{aligned} & z_i^m(t+1)=\sum_{j=0}^{l_m-1}\left(q_{j, i}^{m, 0}-q_{l_m, i}^{m, 0}\right) z_j^m(t) \\ & +\sum_{j=l_m+1}^L\left(q_{j, i}^{m, 1}-q_{l_m, i}^{m, 1}\right) z_j^m(t) \\ & +(1-\alpha) q_{l_m, i}^{m, 0}+\alpha q_{l_m, i}^{m, 1}-\left(\sum_{\substack{w_j^h\gt w_{l_m}^m \\ h \neq m, j \neq l_h}} z_j^h(t)\right) q_{l_m, i}^{m, 1} \\ & -\left(\sum_{\substack{w_j^h \leq w_{l_m^m}^m \\ h \neq m, j \neq l_h}} z_j^h(t)\right) q_{l_m, i}^{m, 0}+\left(\sum_{\substack{h=1 \\ h \neq m}}^K \sum_{\substack{j=0 \\ j \neq l_h}}^L \mathbb{1}_{\left\{w_{l_h}^h\gt w_{l_m}^m\right\}} z_j^h(t)\right) q_{l_m, i}^{m, 1} \\ & +\left(\sum_{\substack{h=1 \\ h \neq m}}^K \sum_{\substack{j=0 \\ j \neq l_h}}^L \mathbb{1}_{\left\{w_{l_h}^h \leq w_{l_m}^m\right\}} z_j^h(t)\right) q_{l_m, i}^{m, 0} \\ & -\sum_{\substack{h=1 \\ h \neq m}}^K \gamma_h\left(\mathbb{1}_{\left\{w_{l_h}^h\gt w_{l_m}^m\right\}} q_{l_m, i}^{m, 1}+\mathbb{1}_{\left\{w_{l_h}^h \leq w_{l_m}^m\right\}} q_{l_m, i}^{m, 0}\right) \\ & \end{aligned}$$

Let [TeX:] $$g_i^m=\sum_{\substack{h=1 \\ h \neq m}}^K\gamma_h\left(\mathbb{1}_{\left\{w_{l_h}^h\gt w_{l_m}^m\right\}} q_{l_m, i}^{m, 1}+\mathbb{1}_{\left\{w_{l_h}^h \leq w_{l_m}^m\right\}} q_{l_m, i}^{m, 0}\right)$$ [TeX:] $$\forall i \in[0, L] \text {, and } C=\left(c^1, \cdots, c^K\right)$$ such that [TeX:] $$c^k=\left(\gamma_k q_{l_k, 0}^{k, 0}, \cdots, \gamma_k q_{l_k, L}^{k, 0}\right)$$ and [TeX:] $$c^m=\left((1-\alpha) q_{l_m, 0}^{m, 0}+\alpha q_{l_m, 0}^{m, 1}-\right.\left.g_0^m, \cdots,(1-\alpha) q_{l_m, L}^{m, 0}+\alpha q_{l_m, L}^{m, 1}-g_L^m\right)$$ for each [TeX:] $$k \neq m.$$

Then:

1) [TeX:] $$k \neq m$$:

(39)
[TeX:] $$z_i^k(t+1)=\sum_{j=0}^{l_k-1}\left(q_{j, i}^{k, 0}-q_{l_k, i}^{k, 0}\right) z_j^k(t)+\sum_{j=l_k+1}^L\left(q_{j, i}^{k, 1}-q_{l_k, i}^{k, 0}\right) z_j^k(t)+c_i^k.$$

2) k = m:

(40)
[TeX:] $$\begin{aligned} & z_i^m(t+1) \\ & =\sum_{j=0}^{l_m-1}\left(q_{j, i}^{m, 0}-q_{l_m, i}^{m, 0}\right) z_j^m(t)+\sum_{j=l_m+1}^L\left(q_{j, i}^{m, 1}-q_{l_m, i}^{m, 1}\right) z_j^m(t) \\ & +c_i^m-\left(\sum_{\substack{w_j^h\gt w_{l_m}^m \\ h \neq m, j \neq l_h}} z_j^h(t)\right) q_{l_m, i}^{m, 1}-\left(\sum_{\substack{w_j^h \leq w_{l_m}^m \\ h \neq m, j \neq l_h}} z_j^h(t)\right) q_{l_m, i}^{m, 0} \\ & +\left(\sum_{\substack{h=1 \\ h \neq m}}^K \sum_{\substack{j=0 \\ j \neq l_h}}^L \mathbb{1}_{\left\{w_{l_h}^h\gt w_{l_m}^m\right\}} z_j^h(t)\right) q_{l_m, i}^{m, 1} \\ & +\left(\sum_{\substack{h=1 \\ h \neq m}}^K \sum_{\substack{j=0 \\ j \neq l_h}}^L \mathbb{1}_{\left\{w_{l_h}^h \leq w_{l_m}^m\right\}} z_j^h(t)\right) q_{l_m, i}^{m, 0} . \\ & \end{aligned}$$

Then, by replacing in the equation above for all k [TeX:] $$z_{l_k}^k(t)$$ with [TeX:] $$\gamma_k-\sum_{j=0, j \neq l_k}^L z_j^k(t),$$ we obtain the following linear relation in [TeX:] $$\jmath_{w^*}$$ between [TeX:] $$\tilde{\boldsymbol{z}}(t+1) \text { and } \tilde{\boldsymbol{z}}(t)$$ where [TeX:] $$\tilde{\boldsymbol{z}}$$ is the proportion vector in which the elements [TeX:] $$z_{l_k}^k$$ for different k are eliminated.

(41)
[TeX:] $$\tilde{\boldsymbol{z}}(t+1)=\boldsymbol{Q} \tilde{\boldsymbol{z}}(t)+\boldsymbol{C},$$

where [TeX:] $$\boldsymbol{C}$$ is a constant matrix and the expression of matrix [TeX:] $$\boldsymbol{Q}$$ is given in Table I. The vector solution of the relaxed problem, denoted by [TeX:] $$\tilde{z}^*$$, is the fixed point of the aforementioned linear equation. By definition of [TeX:] $$\tilde{z}^*$$, [TeX:] $$\tilde{z}^* \in \jmath_{w^*}.$$ Thus, if [TeX:] $$\tilde{\boldsymbol{z}}(0)=\tilde{\boldsymbol{z}}^*+\boldsymbol{e}$$ and [TeX:] $$\tilde{z}(t) \in \jmath_{w^*},$$ we obtain for t:

(42)
[TeX:] $$\tilde{\boldsymbol{z}}(t)-\tilde{\boldsymbol{z}}^*=\boldsymbol{Q}^t \boldsymbol{e} .$$

The analysis of the above linear system is therefore important to prove the local optimality. We first provide the following lemma.

Lemma 2. If for all eigenvalues [TeX:] $$\lambda \text { of } Q,|\lambda|\lt 1 \text {, }$$ then there exists a neighborhood [TeX:] $$\Omega_\sigma\left(\tilde{z}^*\right) \subseteq \jmath_{w^*}$$ such that if [TeX:] $$\tilde{\boldsymbol{z}}(0) \in \Omega_\sigma\left(\tilde{\boldsymbol{z}}^*\right),$$ we have the following:

1) For all [TeX:] $$t \geq 0,\left\|\tilde{\boldsymbol{z}}(t)-\tilde{\boldsymbol{z}}^*\right\|\lt \sigma\left(\tilde{\boldsymbol{z}}(t) \in \jmath_{w^*}\right).$$

2) [TeX:] $$\tilde{\boldsymbol{z}}(t)$$ converges to [TeX:] $$\tilde{z}^* .$$

Proof. The proof follows from the convergence of the linear system.

Proposition 9. For all eigenvalue [TeX:] $$\lambda \text { of } Q,|\lambda|\lt 1 \text {. }$$

Proof. See the proof in appendix M.

The aforementioned result, combined with Lemma 2, proves the convergence of the fluid limit system. Consequently, [TeX:] $$\boldsymbol{z}(t)$$ converges to the fixed point of Equation (32), [TeX:] $$z^* \text {. }$$ However, the above result is not enough to prove the local optimality, as we have to show that the stochastic vector [TeX:] $$\boldsymbol{Z}^N(t)$$ converges to [TeX:] $$z^*$$ in probability. For that, we introduce the discrete-time version of Kurtz Theorem applied to our problem (see [29]):

Proposition 10. There exists a neighborhood [TeX:] $$\Omega_\delta\left(\boldsymbol{z}^*\right) \text { of } \boldsymbol{z}^*$$ such that if [TeX:] $$\boldsymbol{Z}^N(0)=\boldsymbol{z}(0)=\boldsymbol{x} \in \Omega_\delta\left(\boldsymbol{z}^*\right)$$, then for any μ > 0 and finite time horizon T, there exist positive constants [TeX:] $$C_1 \text{ and } C_2$$ such that

(43)
[TeX:] $$P_{\boldsymbol{x}}\left(\sup _{0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}(t)\right\| \geq \mu\right) \leq C_1 \exp \left(-N C_2\right),$$

where δ < σ, and [TeX:] $$P_x$$ denotes the probability conditioned on the initial state [TeX:] $$\boldsymbol{Z}^N(0)=\boldsymbol{z}(0)=\boldsymbol{x}.$$ Furthermore, [TeX:] $$C_1 \text{ and } C_2$$ are independent of x and N.

According to the above proposition, the system state [TeX:] $$Z^N(t)$$ behaves very closely to the fluid approximation model [TeX:] $$\boldsymbol{z}(t)$$ when the number of users N is large. Since we have shown the convergence of [TeX:] $$\boldsymbol{z}(t)$$ to within [TeX:] $$\Omega_\sigma\left(\boldsymbol{z}^*\right)$$, we are ready to establish the local convergence of the system state [TeX:] $$\boldsymbol{Z}^N(t) \text { to } z^*$$.

Lemma 3. If [TeX:] $$\boldsymbol{Z}^N(0)=\boldsymbol{x} \in \Omega_\delta\left(\boldsymbol{z}^*\right),$$ then for any μ > 0, there exists a time [TeX:] $$T_0$$ such that for any [TeX:] $$T \gt T_0,$$ there exists positive constants [TeX:] $$s_1 \text{ and } s_2$$ with,

(44)
[TeX:] $$P_{\boldsymbol{x}}\left(\sup _{T_0 \leq t\lt T}\left\|Z^N(t)-\boldsymbol{z}^*\right\| \geq \mu\right) \leq s_1 \exp \left(-N s_2\right) .$$

Proof. See appendix N.

Now we are ready to prove the asymptotic local optimality of the proposed scheduling policy.

Proposition 11. If the initial state is in the set [TeX:] $$\Omega_\delta\left(z^*\right),$$ then

(45)
[TeX:] $$\lim _{T \rightarrow \infty} \lim _{N \rightarrow \infty} \frac{C_T^N(\boldsymbol{x})}{N}=\frac{C^{R P, N}}{N} .$$

Proof. See appendix O.

VIII. GLOBAL ASYMPTOTIC OPTIMALITY

In this section, we prove that from any initial state x, the long-run expected average cost obtained with the Whittle’s index policy is optimal when N is very large. In contrast to the method used to prove the local optimality, we work here with the steady state distribution of the stochastic process [TeX:] $$Z^N(t).$$ To ensure that such a stationary distribution exists, we need to show that there is at least one recurrent state. Since the states evolve according to a finite state Markov chain, we just need to prove that there exists a state reachable from any other states.

Lemma 4. The state defined as for each class k, [TeX:] $$z^k=(1,0,\cdots,0)$$ and denoted by [TeX:] $$z_0$$1, is reachable from any initial state using the Whittle’s index policy.

1 [TeX:] $$z_0$$ can be seen as the system’s state where all queues are in the queue state 0

Proof. See appendix P.

This lemma is stronger than proving the existence of a recurrent state. Indeed, this allows us to deduce that [TeX:] $$Z^N(t)$$ evolves in one recurrent aperiodic class, and that there exists a stationary distribution for [TeX:] $$Z^N(t)$$ denoted by [TeX:] $$Z^N(\infty).$$ We still need to check if for a fixed N, there exists at least one recurrent state within [TeX:] $$\Omega_\epsilon\left(z^*\right),$$ as otherwise [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ will be a transient class. If such a state exists, surely [TeX:] $$Z^N(t)$$ will evolve in one recurrent class that contains this recurrent state. To that end, we demonstrate here that [TeX:] $$z^*$$ is reachable from any state for a fixed N. Since [TeX:] $$z_0$$ is reachable from any state, we just need to find a path from [TeX:] $$z_0 \text{ to } z^*.$$

Lemma 5. By applying the Whittle’s index policy, the steady state [TeX:] $$z^*$$ is reachable from the state [TeX:] $$z_0.$$

Proof. Considering our system model, the probability of transitioning from queue state 0 to any other state whether the action is active or passive is strictly positive. Thereby, there exists a trajectory from [TeX:] $$z_0 \text{ to } z^*.$$ which lasts only one time slot. Consequently, [TeX:] $$z^*$$ is reachable from [TeX:] $$z_0.$$

From this lemma above, the state [TeX:] $$z^*$$ is reachable from any state, which means that [TeX:] $$z^*$$ is a recurrent state. However, the considered actions schedule a proportion of users (i.e., not an integer value). This is not feasible and unrealistic for some (small) values of N since the queues are not splittable. In fact, for some values of N, the state [TeX:] $$z^*$$ may not exist. On the other hand, we can say that for enough large N , and for any [TeX:] $$\epsilon \gt 0$$, there exists at least one recurrent state within the neighborhood [TeX:] $$\Omega_\epsilon\left(z^*\right) .$$ This will ensure that there is a path to enter a neighborhood [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ from any initial state. However, it is important to ensure that the time to enter [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ should not scale up with N. For that, we give the following assumption which will be later justified via numerical studies in Section IX.

Assumption 1. We assume that the expected time to enter a neighborhood of [TeX:] $$z^*$$ from any initial state x does not depend on the number of queues N. In other words, for all N the time to enter a neighborhood [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ denoted by [TeX:] $$\Gamma_x^N(\epsilon)$$ is bounded by a constant [TeX:] $$T_{b_\epsilon} .$$

Now we provide a useful lemma that allows us to demonstrate the global asymptotic optimality.

Lemma 6. Under Assumption 1, and for any [TeX:] $$\epsilon$$ , we have that:

(46)
[TeX:] $$\lim _{N \rightarrow+\infty} P\left(\boldsymbol{Z}^N(\infty) \in \Omega_\epsilon\left(z^*\right)\right)=1.$$

Proof. See Lemma 6 in [21].

Since we have found a stationary distribution of [TeX:] $$Z^N(t)$$ under Whittle’s index policy, the expected average cost under Whittle’s index policy for a fixed N can be written as follows:

(47)
[TeX:] $$\lim _{T \rightarrow \infty} \frac{C_T^N(x)}{N}=\sum_{k=1}^K \sum_{i=0}^L a_k \mathbb{E}\left[Z_i^{k, N}(\infty)\right] d(i) N.$$

Theorem 3. Under assumption 1, and for any initial state, we have that:

(48)
[TeX:] $$\lim _{N \rightarrow+\infty} \lim _{T \rightarrow \infty} \frac{C_T^N(x)}{N}=\frac{C^{R P, N}}{N} .$$

Proof. See appendix Q.

IX. NUMERICAL RESULTS

In this section, we provide numerical results that confirm the asymptotic optimality of the developed Whittle’s index policy. To that extent, we consider the two following scenarios:

1) 2 classes with the following characteristics:

· [TeX:] $$R_1=11 \text { and } R_2=10 \times R_1=110$$

· α = 1/2

· L = 10

· [TeX:] $$\gamma_1=\gamma_2=1 / 2$$

· [TeX:] $$a_1=\frac{10 * 2}{\left(R_1-1\right)}$$

· [TeX:] $$a_2=\frac{10 * 2}{R_2-1}$$

· [TeX:] $$C_d=3$$

2) 3 classes with the following characteristics:

· [TeX:] $$R_1=11, R_2=50 \text { and } R_3=110$$

· α = 1/2

· L = 10

· [TeX:] $$\gamma_1=\gamma_2=\gamma_3=1 / 3$$

· [TeX:] $$a_1=\frac{10 * 2}{\left(R_1-1\right)}$$

· [TeX:] $$a_2=\frac{10 * 2}{R_2-1}$$

· [TeX:] $$a_2=\frac{10 * 2}{R_3-1}$$

· [TeX:] $$C_d=3$$

We also consider two initial states x and y such that all the queues are equal to 0 and L, respectively.

1) Verification of Assumption 1: We plot in Fig. 3, the evolution of the time needed to enter a neighborhood [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ (i.e., hitting time of [TeX:] $$\Omega_\epsilon\left(z^*\right)$$) with respect to N, given that [TeX:] $$\epsilon$$ is small enough.

Fig. 3.
Hitting Time of [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ in function of N: (a) [TeX:] $$Z^N(0)=x$$ and (b) [TeX:] $$Z^N(0)=y$$.

One can see that for large values of N, the hitting time can be considered as a constant and does not diverge for both initial states x and y. This implies that the hitting time is bounded for large values of N which consolidates Assumption 1.

2) Performance of the Whittle’s Index Policy: In this section, we compare the long-run expected average cost per user under the Whittle’s index policy, i.e., [TeX:] $$\lim _{T \rightarrow \infty} C_T^N(x) / N=C^{W I P, N} / N,$$ with the one obtained by applying the Max-Weight policy MWP, [TeX:] $$C^{M W P, N} / N.$$ The policy MWP schedules, at each time t, the M weighted longest queues (equivalently the M highest [TeX:] $$\left.a_k d\left(q_i^k(t)\right)\right).$$ We also compare for the first scenario, the performance of these two policies with the optimal cost per user obtained by using the optimal solution of the relaxed problem, i.e., [TeX:] $$C^{RP, N} / N.$$ The results are plotted in Figs. 4(a), 4(b), 5(a), and 5(b) where (a) corresponds to the initial state x and (b) corresponds to the initial state y (defined above).

One can see that for large N, regardless of the initial state, the cost incurred by adopting the Whittle’s index policy tends to the optimal cost of the relaxed problem, which proves that it asymptotically converges to the optimal solution of the original problem. One can also remark that the optimal cost of the relaxed problem per user is constant and does not depend on N (see Section VI). Lastly, we remark that the solution given by MWP is suboptimal and lacks behind our proposed scheduling scheme.

Fig. 4.
Performance evaluation of Whittle’s index policy for 2 classes.
Fig. 5.
Performance evaluation of Whittle’s index policy for 3 classes.

3) Fairness among users: In order to improve the fairness among the users in the network, one can use the developed Whittle’s index policy up to some modifications. To that extent, we introduce in this section a new policy Θ which works as follows: At each time slot t, we schedule the users with the highest [TeX:] $$W_k\left(q_i^k(t)\right) \overline{D_k}\left(q_i^k(t)\right),$$ where [TeX:] $$q_i^k(t)$$ is the queue state of user i in class k, [TeX:] $$W_k$$ is the Whittle’s index of state [TeX:] $$q_i^k(t)$$ and [TeX:] $$\overline{D_k}\left(q_i^k(t)\right)=\sum_{u=1}^t a_k d\left(q_i^k(u)\right) / t .$$ To evaluate numerically the performance of this policy, we consider the following two costs [TeX:] $$C_1^{\pi, N} \text { and } C_2^{\pi, N}$$ incurred respectively by users of class 1 and users of class 2 under policy π , specifically [TeX:] $$C_1^{\pi, N}=\lim _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{i=1}^{\gamma_1 N} a_1 d\left(q_i^2(t)\right) \mid x, \pi\right]$$ and [TeX:] $$C_2^{\pi, N}=\lim _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{i=1}^{\gamma_2 N} a_2 d\left(q_i^2(t)\right) \mid x, \pi\right] .$$ We plot these quantities over N when π = WIP and when π = Θ with respect to N in Fig. 6 considering the following network settings:

· [TeX:] $$R_1=11, \text { and } R_2=12$$

· α = 1/2

· L = 10

· [TeX:] $$\gamma_1=\gamma_2=1 / 2$$

· [TeX:] $$a_1=\frac{10 * 2}{\left(R_1-1\right)}$$

· [TeX:] $$a_2=\frac{10 * 2}{R_2-1}$$

· [TeX:] $$C_d=3$$

We conclude that the new policy gives a better performance in terms of fairness, since it reduces the gap between the costs of the two classes of users.

Fig. 6.
Evaluation of [TeX:] $$C_1^{\pi, N} \text { and } C_2^{\pi, N}$$ in function of N under policy Θ and Whittle’s index policy WIP.

X. CONCLUSION

In this paper, we have studied the problem of users and channels scheduling under dynamic traffic arrivals. At each time slot, only M channels can be allocated to the users knowing that a user can be allocated one channel at most. We have formulated a Lagrangian relaxation of the optimization problem and provided a characterization of the optimal solution of this relaxed problem. We have then developed a simple Whittle’s index policy to allocate the channels to the users and proved its asymptotic local and global optimality when the numbers of users and channels are large enough. This result is of interest as the developed Whittle’s index policy has a low complexity and is near optimal for large number of users. We have then provided numerical results that corroborate our claims.

APPENDIX A

PROOF OF PROPOSITION 1

We consider the Bellman equation (9). By summing the RHS and the LHS of (9), for all k and i we obtain:

(49)
[TeX:] $$\begin{aligned} &\begin{aligned} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[V_i^k\left(q_i^k\right)+\theta_i^k\right]= & \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \min _{s_i^k}\left\{C_k\left(q_i^k, s_i^k\right)\right. \\ & \left.+\sum_{q_i^{\prime k}} \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right)\right\} \end{aligned}\\ &=\min _{\mathrm{s}}\left\{\sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[C_k\left(q_i^k, s_i^k\right)+\sum_{q_i^{\prime \prime}} \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right)\right]\right\}, \end{aligned}$$

where [TeX:] $$\mathbf{s}=\left(s_1^1, \cdots, s_{\gamma_1 N}^1, \cdots, s_1^K, \cdots, s_{\gamma_k N}^K\right).$$ We also have that:

(50)
[TeX:] $$\begin{aligned} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}\right) & =\sum_{q_i^{\prime \prime k}} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}, q_i^{\prime k}\right) \operatorname{Pr}\left(q_i^{\prime k} \mid \mathbf{q}, \mathbf{s}\right) \\ & =\sum_{q_i^{\prime k}} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}, q_i^{\prime k}\right) \operatorname{Pr}\left(q_i^{\prime k} \mid q_k^i, s_i^k\right), \end{aligned}$$

for all [TeX:] $$\mathbf{q}=\left(q_1^1, \cdots, q_{\gamma_1 N}^1, \cdots, q_1^K, \cdots, q_{\gamma_K N}^K\right)$$ and [TeX:] $$\mathbf{q}^{\prime}=\left(q_1^{\prime 1}, \cdots, q_{\gamma_1 N}^{\prime 1}, \cdots, q_1^{\prime K}, \cdots, q_{\gamma_K N}^{\prime K}\right).$$ Since [TeX:] $$\operatorname{Pr}\left(q_i^k \mid \mathbf{q}, \mathbf{s}\right)$$ only depends on the decision taken with respect to user i in class k, we obtain:

(51)
[TeX:] $$\begin{aligned} & \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \sum_{q_i^{\prime k}} \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right) \\ & \quad=\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \sum_{\mathbf{q}^{\prime}} \sum_{q_i^{\prime k}} \operatorname{Pr}\left(\mathbf{q}^{\prime}|\mathbf{q},| \mathbf{s}, q_i^{\prime k}\right) \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right) \\ & \quad=\sum_{\mathbf{q}^{\prime}} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}\right) \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k\left(q_i^{{\prime k}}\right) . \end{aligned}$$

From the previous equations we obtain:

(52)
[TeX:] $$\begin{aligned} & \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k\left(q_i^k\right)+\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \theta_i^k \\ & =\min _{\mathbf{s}}\left[\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} C_k\left(q_i^k, s_i^k\right)\right. \\ & \left.\quad+\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \sum_{q_i^{\prime k}} \operatorname{Pr}\left(q_i^{\prime k} \mid q_i^k, s_i^k\right) V_i^k\left(q_i^{\prime k}\right)\right] \\ & =\min _{\mathbf{s}}\left[\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} C\left(q_i^k, s_i^k\right)+\sum_{\mathbf{q}^{\prime}} \operatorname{Pr}\left(\mathbf{q}^{\prime} \mid \mathbf{q}, \mathbf{s}\right) \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k\left(q_i^{\prime k}\right)\right] . \end{aligned}$$

According to [25, Chapter 2, Theorem 2.1], it exists a unique function [TeX:] $$\bar{\boldsymbol{V}}$$ and a constant θ that resolve the (8). Subsequently, since we have found a bounded function [TeX:] $$\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k\left(q_i^k\right),$$ and a constant [TeX:] $$\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \theta_i^k$$ that satisfy also the (8), then [TeX:] $$\bar{\boldsymbol{V}}(\mathbf{q})=\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} V_i^k\left(q_i^k\right)$$ and [TeX:] $$\theta=\sum_{k=1}^K \sum_{i=1}^{\gamma_k N} \theta_i^k.$$ This is equivalent to find for each user the decision that minimizes the right hand side of each individual Bellman equation. This concludes the proof.

APPENDIX B

PROOF OF PROPOSITION 2

When i < L:

1) j ≤ n:

Since j ≤ n, the optimal decision is to stay idle, that means if A denotes the number of arrival packets, in the next time slot the number of packets will be i = j + A with A ≤ R-1, then A = i-j. Therefore, the probability to transit from state j to i is the probability that A = i-j, which is exactly [TeX:] $$\pi_{i-j}.$$

2) j > n:

The optimal decision in this case is to transmit, then all j packets will be transmitted. Taking into account the A arrival packets, then the new state for the next time slot will be i = A. This explains that the probability to transit from state j to i is the probability that A is equal to i which is equal to [TeX:] $$\pi_i.$$

When i = L:

1) j ≤ n:

The optimal decision is the passive action. Then A arriving packets are added to the j packets present in the queue. At the next time slot, the number of packets is j + A. According to (1), since we cannot exceed the buffer length L, we reach the state L if j+A ≥ L. Since A ≤ R-1, then the probability of this event or equivalently the probability to transit from state j to state L is [TeX:] $$\operatorname{Pr}(L-j \leq A \leq R-1)=\sum_{k=L-j}^{R-1} \operatorname{Pr}(A=k)=(R-L+j) \pi_{L-j}=(R-L+j) \rho.$$

2) j > n:

The optimal decision is the active action. Subsequently, the next state is 0. Thus to reach the state L, the arrival packet number A must be in the set [L, R - 1]. Therefore, the Pprobability to transit from 0 to L is [TeX:] $$\operatorname{Pr}(L \leq A \leq R-1)=\sum_{k=L}^{R-1} \operatorname{Pr}(A=k)=(R-L) \pi_L=(R-L) \rho \text {. }$$

APPENDIX C

PROOF OF PROPOSITION 3

We have that:

(53)
[TeX:] $$u(i)=\sum_{j=0}^n p_n(j, i) u(j)+\sum_{j=n+1}^L p_n(j, i) u(j).$$

We distinguish between two cases: n < L and n = L. We analyze each case separately.

1) n < L:

We first give the expression of u(i) when i < L based on Proposition 2:

(54)
[TeX:] $$u(i)=\sum_{j=0}^n \pi_{i-j} u(j)+\sum_{j=n+1}^L \pi_i u(j).$$

By definition of π given in Definition 2, we have that:

(55)
[TeX:] $$u(i)=\sum_{j=0}^{\min (i, n)} \rho u(j)+\sum_{j=n+1}^L \rho u(j).$$

Now, in order to prove Proposition 3 for this case, we will distinguish between three sub-cases:

a) n + 1 ≤ i ≤ L - 1

b) 0 ≤ i ≤ n

c) i = L

a) Proof of u(i) = ρ for n + 1 ≤ i ≤ L - 1:

We have min(i; n) = n, then:

(56)
[TeX:] $$u(i)=\sum_0^n \rho u(j)+\sum_{n+1}^L \rho u(j) .$$

Knowing that [TeX:] $$\sum_0^L u(j)=1,$$ thus [TeX:] $$\sum_0^n \rho u(j)+\sum_{n+1}^L \rho u(j)=\rho.$$ Hence, u(i) = ρ.

b) Proof of u(i) = [TeX:] $$\rho(1-\rho)^{n-i} \text { for } 0 \leq i \leq n \text { : }$$

We prove this result by induction, i.e., we start by proving that the statement [TeX:] $$P(i)=\left\{u(i)=\rho(1-\rho)^{n-i}\right\}$$ holds for i = n, then we show that it holds for i-1, if P(i - 1) is true.

· i = n:

We have that: [TeX:] $$u(n)=\sum_0^n \rho u(j)+\sum_{n+1}^L \rho u(j)=\rho=\rho(1-\rho)^{n-n}.$$ Thereby, P(n) is true.

· [TeX:] $$P(i) \Rightarrow P(i-1)$$:

We have that [TeX:] $$\min (i-1, n)=i-1,$$ then:

(57)
[TeX:] $$\begin{aligned} & u(i-1)=\sum_0^{i-1} \rho u(j)+\sum_{n+1}^L \rho u(j) \\ & u(i-1)=\sum_0^i \rho u(j)+\sum_{n+1}^L \rho u(j)-\rho u(i) \\ & u(i-1)=u(i)-\rho u(i) . \end{aligned}$$

By induction assumption, we have that [TeX:] $$u(i)=\rho(1-\rho)^{n-i}.$$ To that extent, replacing the expression of u(i) in (57), we obtain:

[TeX:] $$u(i-1)=(1-\rho) u(i)=\rho(1-\rho)^{n-(i-1)} .$$

That concludes the proof.

APs for i = L, u(L) is nothing but the subtraction of the [TeX:] $$\sum_{j=0}^{L-1} u(j)$$ from 1. By doing so, we get:

[TeX:] $$u(L)=(1-\rho)^{n+1}-(L-n-1) \rho$$

2) n = L:

(58)
[TeX:] $$u(i)=\sum_{j=0}^L p_L(j, i) u(j).$$

For i ≤ L - 1:

According to Proposition 2, we have:

(59)
[TeX:] $$u(i)=\sum_{j=0}^L \pi_{i-j} u(j).$$

By definition of π, we get:

(60)
[TeX:] $$u(i)=\sum_0^i \rho u(j).$$

We prove by induction that for 0 ≤ i < L, u(i) = 0

We have u(0) = [TeX:] $$\rho u(0)$$ = 0.

We suppose that u(j) = 0 for all 0 ≤ j ≤ i, then:

(61)
[TeX:] $$\begin{aligned} u(i+1) & =\sum_0^{i+1} \rho u(j) \\ & =\sum_0^i \rho u(j)+\rho u(i+1) \\ & =0+\rho u(i+1) \\ u(i+1) & =0 . \end{aligned}$$

Then, for all [TeX:] $$i \in[0, L-1], u(i)=0$$

Since [TeX:] $$\sum_{j=0}^L u(j)=1,$$ we have [TeX:] $$u(L)=1-\sum_{j=0}^{L-1} u(j)=1-0=1.$$

This ends the proof.

APPENDIX D

PROOF OF PROPOSITION 4

As mentioned previously in Remark 2, we denote [TeX:] $$\sum_{q=0}^L a u^n(q) d(q) \text { by } a_n \text { and } \sum_{q=0}^n u^n(q) \text { by } b_n \text {. }$$ Before proving the proposition, we give two useful lemmas.

Lemma 7. Considering [TeX:] $$a_{j-1}, a_j, a_{j+1} \text { and } b_{j-1}, b_j, b_{j+1} \text {, }$$ such that [TeX:] $$b_{j-1}\lt b_j\lt b_{j+1}.$$

If [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}$$ Then:

(62)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

If [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \geq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}$$ Then:

(63)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \geq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \geq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

If [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}}$$ Then:

(64)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

If [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \geq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}}$$ Then:

(65)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \geq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \geq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

If [TeX:] $$\frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}$$ Then:

(66)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

If [TeX:] $$\frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \geq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}$$ Then:

(67)
[TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \geq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \geq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

Proof. See appendix E.

Lemma 8. The largest minimizer at step j in Algorithm 1 satisfies [TeX:] $$n_j=\min \left\{k: b_k=b_{n_j}\right\}$$

Proof. See appendix F.

· Indexability:

We consider [TeX:] $$n_1 \text{ and } n_2$$ the optimal thresholds of the problem (9) when the Lagrangian parameter W is equal to [TeX:] $$W_1 \text{ and } W_2$$ respectively, such that [TeX:] $$W_1 \lt W_2.$$ To that extent, we show that [TeX:] $$n_1$$ is less that [TeX:] $$n_2.$$ In fact, establishing the aforementioned result is sufficient to show the indexability since by proving it, we can say that the set of states for which the optimal decision is passive action when [TeX:] $$W=W_1,$$ is included in the set of states for which the optimal decision is passive action when [TeX:] $$W=W_2,$$ specifically [TeX:] $$\left[0, n_1\right] \subseteq\left[0, n_2\right].$$ Subsequently: [TeX:] $$D\left(W_1\right) \subseteq D\left(W_2\right)$$

In order to prove that, we just need to demonstrate that [TeX:] $$b_{n_1} \leq b_{n_2} \text { since } n_1 \leq n_2$$ is equivalent to [TeX:] $$b_{n_1} \leq b_{n_2},$$ due to to the increase of [TeX:] $$b_n$$ with n.

As [TeX:] $$n_1 \text{ and } n_2$$ are the minimizers of (7) when [TeX:] $$W=W_1 \text{ and } W=W_2,$$ respectively, then:

(68)
[TeX:] $$a_{n_1}-W_1 b_{n_1} \leq a_{n_2}-W_1 b_{n_2},$$

(69)
[TeX:] $$a_{n_1}-W_2 b_{n_1} \geq a_{n_2}-W_2 b_{n_2} .$$

This implies:

(70)
[TeX:] $$W_2\left(b_{n_1}-b_{n_2}\right) \leq a_{n_1}-a_{n_2} \leq W_1\left(b_{n_1}-b_{n_2}\right) .$$

Therefore: [TeX:] $$\left(W_2-W_1\right)\left(b_{n_1}-b_{n_2}\right) \leq 0.$$ Since [TeX:] $$W_2-W_1 \gt 0,$$ Thus [TeX:] $$b_{n_1} \leq b_{n_2}.$$ Consequently, [TeX:] $$n_1 \leq n_2.$$

Thereby, we conclude the indexability.

· Whittle’s index expressions:

For the Whittle’s index expressions, we should demonstrate that, for [TeX:] $$k \quad \in \quad\left[n_{j-1}, n_j\right] \text {, } W_j=\min \{W, k \in D(W)\}.$$

For that, we prove first that for [TeX:] $$W\lt W_j, k \notin D(W).$$ If [TeX:] $$k\gt n_{j-1}, W\lt W_j, \text{ and } b_k \neq b_{n_{j-1}},$$ then [TeX:] $$W\lt W_j \leq \frac{a_k-a_{n_{j-1}}}{b_k-b_{n_{j-1}}}.$$ Therefore, [TeX:] $$a_k-b_k W\gt a_{n_{j-1}}-b_{n_{j-1}} W \text {. }$$

If [TeX:] $$k\gt n_{j-1}, W\lt W_j \text { and } b_k=b_{n_{j-1}},$$ given that [TeX:] $$a_k\gt a_{n_{j-1}} \text {, then } a_k-b_k W\gt a_{n_{j-1}}-b_{n_{j-1}} W \text {. }$$

Hence we have proved that, for [TeX:] $$W\lt W_j \text { and } k>n_{j-1}, a_k-b_kW \gt a_{n_{j-1}}-b_{n_{j-1}}W.$$ That means for [TeX:] $$W\lt W_j,$$ the optimal threshold is [TeX:] $$n_{j-1}$$ or even less. Therefore, for [TeX:] $$\left.k \in] n_{j-1}, n_j\right],$$ the optimal action is the active one, i.e., [TeX:] $$k \notin D(W).$$

We still have to prove that [TeX:] $$k \in D(W_j).$$

For that, we prove that the optimal threshold is at least [TeX:] $$n_j$$ when [TeX:] $$W=W_j.$$ In other words, for all [TeX:] $$a_k-b_k W_j \geq a_{n_j}-b_{n_j} W_j.$$ We demonstrate this result by induction in j:

- j = 0

By definition, [TeX:] $$W_0 \leq \frac{a_k-a_{-1}}{b_k} \quad \forall k \geq 0.$$ Furthermore, as [TeX:] $$b_n$$ is increasing with n, then [TeX:] $$b_k \leq b_{n_0}$$ for [TeX:] $$0 \leq k\lt n_0 .$$ However, according to Lemma 8, [TeX:] $$b_k$$ is necessarily strictly less than [TeX:] $$b_{n_0}.$$ Thus, by using Lemma 7 (fourth case), we can deduce that [TeX:] $$\frac{a_{n_0}-a_k}{b_{n_0}-b_k} \leq W_0.$$ That means, as [TeX:] $$b_{-1}=0 \text {, }$$ we have for [TeX:] $$k \in\left[-1, n_0\right], \frac{a_{n_0}-a_k}{b_{n_0}-b_k} \leq W_0 \text {, }$$ which implies that [TeX:] $$a_k-b_k W_0 \geq a_{n_0}-b_{n_0} W_0 .$$

- We suppose at step j, [TeX:] $$a_k-b_k W_j \geq a_{n_j}-b_{n_j} W_j$$ i.e., [TeX:] $$\frac{a_{n_j}-a_k}{b_{n_j}-b_k} \leq W_j$$ for [TeX:] $$k \lt n_j$$ (this remains true since [TeX:] $$b_k \lt b_{n_j}$$ according to Lemma 8).

We show that [TeX:] $$a_k-b_k W_{j+1} \geq a_{n_{j+1}}-b_{n_{j+1}}W_{j+1}$$ for [TeX:] $$k \lt n_{j+1},$$ i.e.:

When [TeX:] $$n_j \leq k \lt n_{j+1},$$ then if [TeX:] $$b_k \neq b_{n_j}, \frac{a_k-a_{n_j}}{b_k-b_{n_j}} \geq W_{j+1}.$$ Thus, by using Lemma 7 (fourth case), we get [TeX:] $$\frac{a_{n_{j+1}-a_k}}{b_{n_{j+1}}-b_k} \leq W_{j+1}$$ [TeX:] $$\left(b_{n_j}\lt b_k\lt b_{n_{j+1}}\right) \text {. }$$ If [TeX:] $$\frac{a_{n_{j+1}}-a_k}{b_{n_{j+1}}-b_k}=\frac{a_{n_{j+1}}-a_k}{b_{n_{j+1}}-b_{n_j}} \leq \frac{a_{n_{j+1}}-a_{n_j}}{b_{n_{j+1}}-b_{n_j}}=W_{j+1}$$ since [TeX:] $$a_k \geq a_{n_j}.$$

When [TeX:] $$k \lt n_j,$$ we have that [TeX:] $$\frac{a_{n_j}-a_k}{b_{n_j}-b_k} \leq W_j$$ (induction assumption). By definition of [TeX:] $$n_j$$ in Algorithm 1, we have [TeX:] $$W_j\lt \frac{a_{n_{j+1}}-a_{n_{j-1}}}{b_{n_{j+1}}-b_{n_{j-1}}}.$$ Then according to Lemma 7 (third case), [TeX:] $$W_j \leq W_{j+1} (b_{n_{j-1}}\lt b_{n_j}\lt b_{n_{j+1}}).$$ Therefore [TeX:] $$\frac{a_{n_j}-a_k}{b_{n_j}-b_k} \leq W_{j+1}$$ and by using again Lemma 7 (first case), [TeX:] $$\frac{a_{n_{j+1}}-a_k}{b_{n_{j+1}}-b_k} \leq W_{j+1}.$$ Therefore, for all [TeX:] $$k \leq n_{j+1}, a_k-b_k W_{j+1} \geq a_{n_{j+1}}-b_{n_{j+1}} W_j.$$

As a consequence, we have proved by induction that at any step j, for [TeX:] $$k\lt n_j, a_k-b_k W_j \geq a_{n_j}-b_{n_j} W_j.$$

Then when [TeX:] $$W=W_j,$$ the optimal threshold is at least [TeX:] $$n_j.$$ This means that if [TeX:] $$k \in\left[n_{j-1}, n_j\right],$$ then k is surely less or equal than the optimal threshold when [TeX:] $$W=W_j,$$ which implies that the optimal decision at state k is passive action, i.e., [TeX:] $$k \in D\left(W_j\right).$$

Combining the two results for [TeX:] $$k \in\left[n_{j-1}, n_j\right]:$$

- For [TeX:] $$W \lt W_j, k \notin D(W).$$

- [TeX:] $$k \in D\left(W_j\right).$$

Then [TeX:] $$W_j=\min \{W, k \in D(W)\}.$$ This concludes the proof.

APPENDIX E

PROOF OF PROPOSITION 7

We will just prove the first case. For the other cases, the proof is similar.

First case: [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} \Longrightarrow \frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}:$$

For the LHS inequality:

(71)
[TeX:] $$\begin{aligned} \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} & =\frac{a_{j+1}-a_j}{b_{j+1}-b_{j-1}}+\frac{a_j-a_{j-1}}{b_{j+1}-b_{j-1}} \\ & \geq \frac{\left(a_j-a_{j-1}\right)\left(b_{j+1}-b_j\right)}{\left(b_j-b_{j-1}\right)\left(b_{j+1}-b_{j-1}\right)}+\frac{a_j-a_{j-1}}{b_{j+1}-b_{j-1}} . \end{aligned}$$

The inequality above comes from the fact that [TeX:] $$b_{j-1}\lt b_j\lt b_{j+1}$$ and [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}.$$

Then

(72)
[TeX:] $$\begin{aligned} \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} & \geq \frac{a_j-a_{j-1}}{b_j-b_{j-1}}\left[\frac{b_{j+1}-b_j+b_j-b_{j-1}}{b_{j+1}-b_{j-1}}\right] \\ & =\frac{a_j-a_{j-1}}{b_j-b_{j-1}}. \end{aligned}$$

For the RHS inequality:

(73)
[TeX:] $$\begin{aligned} \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} & =\frac{a_{j+1}-a_j}{b_{j+1}-b_{j-1}}+\frac{a_j-a_{j-1}}{b_{j+1}-b_{j-1}} \\ & \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_{j-1}}+\frac{\left(a_{j+1}-a_j\right)\left(b_j-b_{j-1}\right)}{\left(b_{j+1}-b_j\right)\left(b_{j+1}-b_{j-1}\right)}. \end{aligned}$$

where the above inequality comes from the fact that [TeX:] $$b_{j-1}\lt b_j \lt b_{j+1}$$ and [TeX:] $$\frac{a_j-a_{j-1}}{b_j-b_{j-1}} \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j} .$$

Then

(74)
[TeX:] $$\begin{aligned} \frac{a_{j+1}-a_{j-1}}{b_{j+1}-b_{j-1}} & \leq \frac{a_{j+1}-a_j}{b_{j+1}-b_j}\left[\frac{b_{j+1}-b_j+b_j-b_{j-1}}{b_{j+1}-b_{j-1}}\right] \\ & =\frac{a_{j+1}-a_j}{b_{j+1}-b_j} \end{aligned}$$

APPENDIX F

PROOF OF PROPOSITION 8

We consider i such that [TeX:] $$b_i = b_{n_j}$$ and we prove that [TeX:] $$n_j \leq i$$:

By construction of [TeX:] $$n_j, b_{n_{j-1}} \neq b_{n_j}$$ and [TeX:] $$n_{j-1} \lt n_j.$$ Hence, by increase of [TeX:] $$b_k, b_{n_j} \geq b_{n_{j-1}}.$$

Therefore [TeX:] $$b_i=b_{n_j} \gt b_{n_{j-1}},$$ and [TeX:] $$i \gt n_{j-1}.$$ Consequently, according to definition of [TeX:] $$n_j$$:

(75)
[TeX:] $$\frac{a_{n_j}-a_{n_{j-1}}}{b_{n_j}-b_{n_{j-1}}} \leq \frac{a_i-a_{n_{j-1}}}{b_i-b_{n_{j-1}}}$$

(76)
[TeX:] $$\frac{a_{n_j}-a_{n_{j-1}}}{b_{n_j}-b_{n_{j-1}}} \leq \frac{a_i-a_{n_{j-1}}}{b_{n_j}-b_{n_{j-1}}}$$

This implies that [TeX:] $$a_{n_j} \leq a_i.$$

If [TeX:] $$i \lt n_j,$$ as [TeX:] $$b_i = b_{n_j},$$ then [TeX:] $$a_i \lt a_{n_j}$$ which contradicts with [TeX:] $$a_{n_j} \leq a_i .$$

Therefore [TeX:] $$n_j \leq i.$$ This concludes the proof.

APPENDIX G

PROOF OF PROPOSITION 5

When [TeX:] $$n \in[0, L-1] \text {, }$$ we have [TeX:] $$a_n-a_{n-1}=a \rho C_d+a\left[\rho\left[(L-n)-\left(C_d+L+R\right)(1-\rho)^n\right]+1\right] .$$ To that extent, we denote by g(n) the function: [TeX:] $$a_n-a_{n-1}=a \rho C_d+a\left[\rho\left[(L-n)-\left(C_d+L+R\right)(1-\rho)^n\right]+1\right]$$ and we show that g(.) is positive for [TeX:] $$n \in[0, L] \text {. }$$ To that end, we give the second derivative of g(.):

(77)
[TeX:] $$g^{\prime \prime}(n)=a\left[-\rho(\ln (1-\rho))^2\left(C_d+L+R\right)(1-\rho)^n\right] .$$

It is clear from the above equation that [TeX:] $$g^{\prime \prime}(.)$$ is non positive. Hence, g(.) is concave function with n. That is, for all [TeX:] $$n \in[0, L], g(n) \geq \min \{g(0), g(L)\}.$$ Thereby, our task will be to demonstrate that g(0) and g(L) are both positive. In fact, [TeX:] $$g(0)=0 \geq 0 .$$ While for n = L, it requires more technical analysis to establish the desired result. To that end, we decompose the function g(.) into two functions [TeX:] $$t(\cdot)$$ and f(.) such that:

[TeX:] $$g(n)=t(n)+f(n)$$

where

[TeX:] $$t(n)=a \rho C_d-a \rho(1-\rho)^n C_d,$$

and

We show that t(L) and f(L) are both positive.

We have [TeX:] $$t(L)=a \rho C_d\left(1-(1-\rho)^L\right) \geq 0.$$

Computing f(L), we get:

[TeX:] $$f(L)=a\left[1-\rho(L+R)(1-\rho)^L\right].$$

We have [TeX:] $$f(L)-f(0)=f(L)=\sum_{n=0}^{L-1} f(n+1)-f(n) \text {. }$$ To that extent, we give the expression of [TeX:] $$v(n)=f(n+1)-f(n) \text {, }$$ i.e,:

(78)
[TeX:] $$v(n)=a\left[\rho\left[-1+\rho(L+R)(1-\rho)^n\right]\right],$$

knowing that v(n) is lower bounded by [TeX:] $$a[\rho[-1+\rho(L+R)(1-\rho)^L]]$$ for [TeX:] $$n \in[0, L-1],$$ then:

(79)
[TeX:] $$\begin{aligned} f(L) & \geq \sum_{n=0}^{L-1} a\left[\rho\left[-1+\rho(L+R)(1-\rho)^L\right]\right] \\ & =a\left[L \rho\left[-1+\rho(L+R)(1-\rho)^L\right]\right]. \end{aligned}$$

Therefore:

(80)
[TeX:] $$\begin{aligned} f(L) & =a\left[1-\rho(L+R)(1-\rho)^L\right] \\ & \geq a\left[-L \rho\left[1-\rho(L+R)(1-\rho)^L\right]\right]. \end{aligned}$$

From the above inequality, [TeX:] $$1-\rho(L+R)(1-\rho)^L$$ should be positive otherwise, we will have a non positive term higher than a positive term. Consequently, f(L) is positive. As a consequence, since g(L) is the sum of two positive terms, then g(L) is also positive. Providing that [TeX:] $$g(n) \geq \min \{g(0), g(L)\}$$ for [TeX:] $$n \in[0, L],$$ then [TeX:] $$g(n) \geq 0.$$ Hence for [TeX:] $$n \in[0, L-1]$$

(81)
[TeX:] $$a_n \geq a_{n-1}.$$

We still have to show that [TeX:] $$a_L-a_{L-1} \geq 0.$$ In fact:

(82)
[TeX:] $$\begin{aligned} a_L-a_{L-1}= & a C_d\left[1-(1-\rho)^L\right]+a\left[R-(L+R)(1-\rho)^L\right] \\ = & a C_d\left[1-(1-\rho)^L\right] \\ & +a\left[R\left(1-\rho(L+R)(1-\rho)^L\right)\right] \\ \geq & 0 \end{aligned}$$

Thus, combining the two results (81) and (82), we end up with the desired result.

APPENDIX H

PROOF OF LEMMA 1

At [TeX:] $$W=x_{i, j}, y^i(W)=y^j(W),$$ i.e.:

(83)
[TeX:] $$\begin{aligned} & \sum_{q=0}^L a u^i(q) d(q)-W \sum_{q=0}^i u^i(q) \\ & =\sum_{q=0}^L a u^j(q) d(q)-W \sum_{q=0}^j u^j(q) \\ & \sum_{q=0}^L a u^i(q) d(q)-\sum_{q=0}^L a u^i(q) d(q) \\ & =W \sum_{q=0}^i u^i(q)-W \sum_{q=0}^j u^j(q) \end{aligned}$$

(84)
[TeX:] $$\begin{aligned} & \sum_{q=0}^L a u^i(q) d(q)-\sum_{q=0}^L a u^i(q) d(q) \\ &=W\left[\sum_{q=0}^i u^i(q)-\sum_{q=0}^j u^j(q)\right] \end{aligned}$$

Hence

(85)
[TeX:] $$W=\frac{\sum_{q=0}^L a u^i(q) d(q)-\sum_{q=0}^L a u^j(q) d(q)}{\sum_{q=0}^i u^i(q)-\sum_{q=0}^j u^j(q)} .$$

APPENDIX I

PROOF OF THEOREM 2

In order to prove this theorem, we introduce the following useful lemmas.

Lemma 9. [TeX:] $$x_{n, n-1}$$ is strictly increasing with n

Proof. We have for all [TeX:] $$n \in[0, L-1]:$$

[TeX:] $$x_{n+1, n}-x_{n, n-1}=W(n+1)-W(n)=\frac{a \rho\left(L+C_d-n\right)}{(1-\rho)^{n+1}}\gt 0 .$$

That concludes the proof.

Lemma 10. If for any [TeX:] $$k \in[0, L-1],$$ we have that: [TeX:] $$b_{k-1}\lt b_k \lt b_{k+1}$$ and [TeX:] $$\frac{a_k-a_{k-1}}{b_k-b_{k-1}}\lt \frac{a_{k+1}-a_k}{b_{k+1}-b_k} .$$

Then for any [TeX:] $$k \in[0, L-1]$$, we have for each [TeX:] $$k \lt s \leq L$$:

(86)
[TeX:] $$\frac{a_s-a_{k-1}}{b_s-b_{k-1}}\gt \frac{a_k-a_{k-1}}{b_k-b_{k-1}} .$$

Proof. We fix certain [TeX:] $$k \in[0, L-1]$$, we prove the result by induction:

for s = K+1

(87)
[TeX:] $$\begin{aligned} \frac{a_{k+1}-a_{k-1}}{b_{k+1}-b_{k-1}}= & \frac{a_{k+1}-a_{k-1}-a_k+a_k}{b_{k+1}-b_{k-1}} \\ = & \frac{a_{k+1}-a_k}{b_{k+1}-b_{k-1}}+\frac{a_k-a_{k-1}}{b_{k+1}-b_{k-1}} \\ \gt & \frac{\left(a_k-a_{k-1}\right)\left(b_{k+1}-b_k\right)}{\left(b_k-b_{k-1}\right)\left(b_{k+1}-b_{k-1}\right)} \\ & +\frac{\left(a_k-a_{k-1}\right)\left(b_k-b_{k-1}\right)}{\left(b_k-b_{k-1}\right)\left(b_{k+1}-b_{k-1}\right)}, \end{aligned}$$

where the strict inequality comes from the lemma’s assumptions. Therefore, we have that:

(88)
[TeX:] $$\begin{aligned} \frac{a_{k+1}-a_{k-1}}{b_{k+1}-b_{k-1}} & \gt\frac{a_k-a_{k-1}}{b_k-b_{k-1}}\left[\frac{b_{k+1}-b_k}{b_{k+1}-b_{k-1}}+\frac{b_k-b_{k-1}}{b_{k+1}-b_{k-1}}\right] \\ & =\frac{a_k-a_{k-1}}{b_k-b_{k-1}} . \end{aligned}$$

By induction, we consider that the inequality (86) is true for certain s strictly higher than k. The inequality below is then verified for s + 1:

(89)
[TeX:] $$\begin{aligned} \frac{a_{s+1}-a_{k-1}}{b_{s+1}-b_{k-1}}= & \frac{a_{s+1}-a_{k-1}-a_s+a_s}{b_{s+1}-b_{k-1}} \\ = & \frac{a_{s+1}-a_s}{b_{s+1}-b_{k-1}}+\frac{a_s-a_{k-1}}{b_{s+1}-b_{k-1}} \\ \gt & \frac{\left(a_k-a_{k-1}\right)\left(b_{s+1}-b_s\right)}{\left(b_k-b_{k-1}\right)\left(b_{s+1}-b_{k-1}\right)} \\ & +\frac{\left(a_k-a_{k-1}\right)\left(b_s-b_{k-1}\right)}{\left(b_k-b_{k-1}\right)\left(b_{s+1}-b_{k-1}\right)} \\ = & \frac{a_k-a_{k-1}}{b_k-b_{k-1}}\left[\frac{b_{s+1}-b_s}{b_{s+1}-b_{k-1}}+\frac{b_s-b_{k-1}}{b_{s+1}-b_{k-1}}\right] \\ = & \frac{a_k-a_{k-1}}{b_k-b_{k-1}} . \end{aligned}$$

So the inequality is also true for s + 1. This concludes the proof of the lemma.

Referring to Algorithm 1 that allows us to obtain the Whittle’s indices, we denote by j the step j described in the algorithm.

According to the same algorithm, to show that [TeX:] $$x_{j,j-1}$$ is the Whittle’s index at state j, we need to prove that for all [TeX:] $$n \in[j+1, L], \frac{a_n-a_{j-1}}{b_n-b_{j-1}}\gt \frac{a_j-a_{j-1}}{b_j-b_{j-1}}.$$ Indeed, using Lemma 9, [TeX:] $$W(j)\lt W(j+1)\lt \cdots\lt W(L)$$ Therefore, for all [TeX:] $$k \in[j, L-1], \frac{a_k-a_{k-1}}{b_k-b_{k-1}}\lt \frac{a_{k+1}-a_k}{b_{k+1}-b_k} .$$ Hence, according to Lemma 10, for all [TeX:] $$n \in[j+1, L], \frac{a_n-a_{j-1}}{b_n-b_{j-1}}\gt \frac{a_j-a_{j-1}}{b_j-b_{j-1}}.$$

Thus, the minimizer of [TeX:] $$\frac{a_n-a_{j-1}}{b_n-b_{j-1}}$$ at step j is j. As a consequence, the Whittle’s index of state j according to Algorithm 1 is effectively [TeX:] $$W(j)=\frac{a_j-a_{j-1}}{b_j-b_{j-1}}=x_{j, j-1} .$$

APPENDIX J

PROOF OF THEOREM 6

In order to prove this proposition, we distinguish between two types of classes:

1) Class k in which W is different from all [TeX:] $$W_i^k .$$

2) Class k such that there exists a given state j that satisfies [TeX:] $$W_i^k=W.$$

First type of classes: For the class k in which W is different from all [TeX:] $$W_i^k,$$ we prove that the optimal threshold verifies [TeX:] $$l_k(W)=l_k=\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right.\left.W\}=\arg \max _i\left\{W_i^k\right) \mid W_i^k\lt W\right\}.$$ First we have [TeX:] $$\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right\}=\arg \max _i\left\{W_i^k \mid W_i^k\lt W\right\}$$ since [TeX:] $$W_i^k$$ is different from W for all state i. For state i less than [TeX:] $$l_k,$$ given that [TeX:] $$W_i^k$$ is increasing with i, then [TeX:] $$W_i^k \leq W_{l_k}^k\lt W.$$ Hence, due to the indexability of the class, [TeX:] $$D\left(W_i^k\right) \subseteq D(W),$$ which implies that the optimal decision at state i is the passive action. For the state i strictly greater than [TeX:] $$l_k,$$ by definition of [TeX:] $$l_k, W_i^k$$ must be strictly greater than W since [TeX:] $$l_k$$ is the integer that gives the highest Whittle’s index less than W. Then, according to the definition of Whittle’s index, [TeX:] $$W\lt \min \{W, i \in D(W)\}$$ that means [TeX:] $$W \notin\{W, i \in D(W)\} .$$ Therefore [TeX:] $$i \notin D(W).$$ Thus, the optimal decision at state [TeX:] $$i \gt l_k$$ is the active decision. Hence [TeX:] $$\left.l_k=\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right\}=\arg \max _i\left\{W_i^k\right) \mid W_i^k\lt W\right\}$$ is effectively the optimal threshold [TeX:] $$l_k(W).$$

Now, we tackle the case when there exists j, [TeX:] $$W_j^k=W:$$ We know that according to Theorem 2, [TeX:] $$W_j^k=x_{j, j-1}^k$$ which is the point for which if [TeX:] $$W=x_{j, j-1}^k,$$ we have [TeX:] $$\sum_{q=0}^L a_k u_k^j(q) d(q)-W \sum_{q=0}^j u_k^j(q)=\sum_{q=0}^L a_k u_k^{j-1}(q) d(q)-W \sum_{q=0}^{j-1} u_k^{j-1}(q) .$$ That means, according to (21), for [TeX:] $$W=x_{j, j-1}^k,$$ if j is a minimizer of this equation (j is the optimal threshold), then j - 1 is also a minimizer of this equation. Due to the indexability of the class k, for all states less or equal than j, the optimal decision is to stay passive. Besides, according to the definition of Whittle’s index, for all states strictly higher than j, the optimal decision is to be active. Then, j is indeed an optimal threshold, so as for j - 1.

Hence, the optimal threshold can be either j or j - 1.

In fact, since [TeX:] $$W_0^k\lt \cdots\lt W_{j-1}^k\lt W_j^k=W,$$ then [TeX:] $$j=\arg \max _i\left\{W_i^k \mid W_i^k \leq W\right\} \text {, and } j-1=\arg \max _i\left\{W_i^k \mid W_i^k\lt W\right\}.$$

This proves the proposition.

APPENDIX K

PROOF OF PROPOSITION 7

From optimization theory, it is known that the optimal solution of the dual problem is less or equal than the primal problem’s solution when the constraint is satisfied, i.e.:

(90)
[TeX:] $$\begin{aligned} & \max _W \min _{\phi \in \Phi} f(W, \phi) \\ & \quad \leq \min _{\phi \in \Phi} \limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} a_k d\left(q_i^k(t)\right) \mid \boldsymbol{q}(0), \phi\right] . \end{aligned}$$

As the optimal solution for a fixed W is a threshold-based policy, we use the steady state form and the expression of the LHS of the above inequality becomes:

(91)
[TeX:] $$\begin{aligned} & \max _W \min _\phi f(W, \phi) \\ & =\max _W\left\{\sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[\min _{l_k \in[0, L]}\left\{\sum_{q=0}^L a_k u_k^{l_k}(q) d(q)-W \sum_{q=0}^{l_k} u_k^{l_k}(q)\right\}\right]\right. \\ & \quad+W(1-\alpha) N\} . \end{aligned}$$

with [TeX:] $$\phi$$ being the threshold policy that corresponds to l(W) computed using Proposition 6 for a fixed W. For [TeX:] $$W^*$$ that satisfies the constraint with equality (i.e., [TeX:] $$\alpha N=\sum_{k=1}^K \gamma_k N \sum_{i=l_{k+1}\left(W^*\right)}^L u_k^{l_k\left(W^*\right)}(i),$$ which is in fact true for all N, and then we can get rid of N), we have: [TeX:] $$\sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[-W \sum_{q=0}^{l_k\left(W^*\right)} u_k^{l_k\left(W^*\right)}(q)\right]+W(1-\alpha) N=$$ [TeX:] $$\sum_{k=1}^K \gamma_k N\left[-W\left(1-\sum_{i=l_{k+1}\left(W^*\right)}^L u_k^{l_k\left(W^*\right)}(i)\right)\right]+W(1-/alpha)N=$$ [TeX:] $$-N W+\sum_{k=1}^K \gamma_k N W \sum_{i=l_{k+1}\left(W^*\right)}^L u_k^{l_k\left(W^*\right)}(i)+W(1-\alpha) N=-{N W}+\alpha N+W N-\alpha N=0.$$ Hence, we get:

(92)
[TeX:] $$\begin{aligned} \min _\phi & f\left(W^*, \phi\right) \\ & =f\left(W^*, l\left(W^*\right)\right)=\sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[\sum_{q=0}^L a_k u_k^{l_k\left(W^*\right)}(q) d(q)\right] \\ & =\limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} a_k d\left(q_i^k(t)\right) \mid \boldsymbol{q}(0), l\left(W^*\right)\right] . \end{aligned}$$

Therefore, we obtain a threshold vector [TeX:] $$l\left(W^*\right)$$ that gives us a solution for the constrained relaxed problem (primal problem) that satisfies the constraint (4). Moreover, according to the inequality (90), we have that for all policy [TeX:] $$\phi$$ that satisfies the constraint and belong to [TeX:] $$\Phi$$:

(93)
[TeX:] $$\begin{aligned} f\left(W^*\right. & \left., l\left(W^*\right)\right) \\ & =\sum_{k=1}^K \sum_{i=1}^{\gamma_k N}\left[\sum_{q=0}^L a_k u_k^{l_k\left(W^*\right)}(q) d(q)\right] \\ & =\limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} a_k d\left(q_i^k(t)\right) \mid \boldsymbol{q}(0), l\left(W^*\right)\right] \\ & =\min _\phi f\left(W^*, \phi\right) \\ & \leq \max _W \min _\phi f(W, \phi) \\ & \leq \min _\phi \limsup _{T \rightarrow \infty} \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=1}^{\gamma_k N} a_k d\left(q_i^k(t)\right) \mid \boldsymbol{q}(0), \phi\right] . \end{aligned}$$

We deduce that the solution of the relaxed problem is of type threshold-based policy [TeX:] $$l\left(W^*\right)$$ with [TeX:] $$W^*$$ satisfies [TeX:] $$\alpha = \sum_{k=1}^K \gamma_k \sum_{i=l_{k+1}\left(W^*\right)}^L u_k^{l_k\left(W^*\right)}(i).$$

APPENDIX L

PROOF OF PROPOSITION 8

We define the following order relation in [TeX:] $$\mathbb{R}^K$$ such that for any two vectors [TeX:] $$l^1 \text { and } l^2, l^1 \leq l^2 \Longleftrightarrow$$ for each element of vector of index k, we have [TeX:] $$l_k^1 \leq l_k^2.$$ Recall that according to Proposition 6, we can directly deduce that for [TeX:] $$W_1 \leq W_2$$ [TeX:] $$l\left(W_1\right) \leq l\left(W_2\right).$$

Without loss of generality, when [TeX:] $$W \in \mathbb{R}^{+},$$ the corresponding set of threshold vectors l(W) is perfectly ordered. Then, [TeX:] $$\sum_{k=1}^K \gamma_k \sum_{i=l_k(W)+1}^L u_k^{l_k(W)}(i)$$ is strictly decreasing with l(W), and take discrete values from 1 to 0. According to Proposition 6, we have for each class k and state i, if [TeX:] $$W=W_i^k$$ then there is two possible optimal thresholds vectors [TeX:] $$l^1(W) \text { and } l^2(W) \text { with } l^1(W)\lt l^2(W).$$ Hence we can deduce that there exists a class m and state p such that [TeX:] $$\sum_{k=1}^K \gamma_k \sum_{i=l_k^1\left(W_p^m\right)+1}^L u_k^{l_k^1\left(W_p^m\right)}(i) \geq \alpha$$ and [TeX:] $$\sum_{k=1}^K \gamma_k \sum_{i=l_k^2\left(W_p^m\right)+1}^L u_k^{l_k^2\left(W_p^m\right)}(i) \leq \alpha.$$

According to Proposition 6, when [TeX:] $$W=W_p^m, l_m\left(W_p^m\right)=l_m^2\left(W_p^m\right) \text { and } l_m^1\left(W_p^m\right)=l_m^2\left(W_p^m\right)-1=l_m\left(W_p^m\right)-1$$ can be both the optimal thresholds for class m. As for the other classes, [TeX:] $$l_k^1\left(W_p^m\right)=l_k^2\left(W_p^m\right)=l_k\left(W_p^m\right).$$

If we force [TeX:] $$W^*$$ to be equal to [TeX:] $$W_p^m,$$ the optimal threshold vector can be either [TeX:] $$l^1\left(W_p^m\right) \text { or } l^2\left(W_p^m\right) \text {, }$$ then we can introduce some randomization between the two policies. In other words, we use the threshold policy [TeX:] $$l^1\left(W_p^m\right)$$ with probability θ and [TeX:] $$l^2\left(W_p^m\right)$$ with probability 1 - θ. The new stationary distribution for the class m is then a linear combination of these two threshold policies [TeX:] $$l_m\left(W_p^m\right) \text { and } l_m\left(W_p^m\right)-1 \text { : }$$ [TeX:] $$u_m^*=\theta u_m^{l_m\left(W_p^m\right)}+(1-\theta) u_m^{l_m\left(W_p^m\right)-1}.$$

Hence, in the class m, at state strictly less than [TeX:] $$l_m\left(W_p^m\right),$$ the queues will not transmit, whereas in a state strictly greater than [TeX:] $$l_m\left(W_p^m\right),$$ they will transmit with probability one. If the queues are in state [TeX:] $$l_m\left(W_p^m\right),$$ they will transmit with probability [TeX:] $$\frac{(1-\theta) u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right)}{\theta u_m^{l_m\left(W_p^m\right)}\left(l_m\left(W_p^m\right)\right)+(1-\theta) u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right)}=\frac{(1-\theta) u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right)}{u_m^*\left(l_m\left(W_p^m\right)\right)} .$$ Since the probability to be in this state [TeX:] $$l_m\left(W_p^m\right),$$ is [TeX:] $$u_m^*\left(l_m\left(W_p^m\right)\right),$$ the proportion of time that the queues will be in active mode is:

[TeX:] $$\begin{aligned} & \sum_{k \neq m} \sum_{i=l_k\left(W_p^m\right)+1}^L \gamma_k u_k^{l_k\left(W_p^m\right)}(i) \\ & \quad+\sum_{i=l_m\left(W_p^m\right)+1}^L \gamma_m u_m^*(i)+(1-\theta) \gamma_m u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right) . \end{aligned}$$

When θ = 0, the threshold policy is [TeX:] $$l_m\left(W_p^m\right)-1$$ and the total average time in active mode is higher than α. When θ = 1, the threshold policy is [TeX:] $$l_m\left(W_p^m\right)$$ and the total average time in active mode is less than α.

Given that [TeX:] $$\sum_{k \neq m} \sum_{i=l_k\left(W_p^m\right)+1}^L \gamma_k u_k^{l_k\left(W_p^m\right)}(i) \quad+\sum_{i=l_m\left(W_p^m\right)+1}^L \gamma_m u_m^*(i)+(1-\theta) \gamma_m u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right)$$ is continuous with θ, then there exists [TeX:] $$\theta^*$$ which verifies the equality. Hence, for [TeX:] $$W^*=W_p^m,$$ we get a threshold policy for all classes except for class m where the optimal solution is a linear combination of two threshold policies. Moreover for a given randomized parameter [TeX:] $$\theta^*$$, the constraint (4) is satisfied with equality:

[TeX:] $$\begin{aligned} \alpha= & \sum_{k \neq m} \sum_{i=l_k\left(W_p^m\right)+1}^L \gamma_k u_k^{l_k\left(W_p^m\right)}(i) \\ & +\sum_{i=l_m\left(W_p^m\right)+1}^L \gamma_m u_m^*(i)+\left(1-\theta^*\right) \gamma_m u_m^{l_m\left(W_p^m\right)-1}\left(l_m\left(W_p^m\right)\right) . \end{aligned}$$

TABLE I
THE EXPRESSIONS OF THE MATRICES [TeX:] $$Q_k$$ FOR k ≠ m AND [TeX:] $$Q_m.$$

APPENDIX M

PROOF OF PROPOSITION 9

We derive the eigenvalues of [TeX:] $$\boldsymbol{Q}$$.

The matrix [TeX:] $$\boldsymbol{Q}$$ is of the form:

(94)
[TeX:] $$\left[\begin{array}{ccccccc} Q_1 & 0 & \cdots & \cdots & \cdots & \cdots & 0 \\ 0 & Q_2 & \cdots & \cdots & \cdots & \cdots & 0 \\ \vdots & & \ddots & & & & \\ A_1 & A_2 & \cdots & Q_m & \cdots & A_{K-1} & A_K \\ \vdots & & & \ddots & & \vdots & \\ 0 & 0 & \cdots & \cdots & \cdots & Q_{K-1} & 0 \\ 0 & 0 & \cdots & \cdots & \cdots & 0 & Q_K \end{array}\right].$$

The characteristic polynomial of [TeX:] $$\boldsymbol{Q}$$ is the product of the characteristic polynomial of each matrix [TeX:] $$Q_k:$$

(95)
[TeX:] $$\chi_{\boldsymbol{Q}}(\lambda)=\prod_{k=1}^K \chi_{Q_k}(\lambda).$$

Therefore, the set of [TeX:] $$\boldsymbol{Q}\text{'s}$$ eigenvalues denoted by [TeX:] $$\operatorname{Sp}(\boldsymbol{Q})$$ is composed by the eigenvalues of the matrices [TeX:] $$Q_k$$. Specifically: [TeX:] $$\operatorname{Sp}(\boldsymbol{Q})=\cup_k\operatorname{Sp}(\boldsymbol{Q}) .$$ To that extent, it is sufficient to find the eigenvalues of each matrix [TeX:] $$Q_k$$ to deduce those of [TeX:] $$\boldsymbol{Q}$$. To that end, we distinguish between two cases:

1) k ≠ m: The characteristic polynomial of the matrix [TeX:] $$Q_k$$ is defined as follows:

(96)
[TeX:] $$\chi_{Q_k}=\operatorname{det}\left(Q_k-\lambda I\right).$$

where [TeX:] $$I \in \mathbb{R}^{(L+1) \times(L+1)}$$ is the identity matrix. In order to get a closed-form of this determinant, we apply elementary row and column operations. More specifically, let us denote by [TeX:] $$r_i \text{ and } c_i$$ the row i and column i respectively of the determinant. We also denote by [TeX:] $$a_{i, j}$$ the element in row i and column j of the matrix [TeX:] $$Q_k.$$ For i = 0 till [TeX:] $$l_k-1,$$ we add to the row [TeX:] $$r_L$$ the sum of the rows [TeX:] $$r_i$$ for [TeX:] $$0 \leq i \leq l_k-1.$$ In other words:

(97)
[TeX:] $$r_L=r_L+\sum_{i=0}^{l_k-1} r_i.$$

After doing so, we execute the following operation in order to have zeros for the elements [TeX:] $$a_{L, 0} \text { to } a_{L, l_k-1} \text { : }$$

(98)
[TeX:] $$c_i=c_i-c_L \quad i=0, \cdots, l_k-1.$$

As a result, [TeX:] $$\chi_{Q_k}(\lambda)$$ will be the determinant of the matrix [TeX:] $$G_k$$ reported in Table II. Since [TeX:] $$G_k$$ is an upper triangular matrix, the determinant will be simply the product of diagonal elements of matrix [TeX:] $$G_k$$. Hence, the determinant of [TeX:] $$G_k$$ will be equal to [TeX:] $$(-\lambda)^{l_k}$$ times the [TeX:] $$(-\lambda)^{L-l_k+1}.$$

As a consequence, the determinant is equal to:

(99)
[TeX:] $$\chi_{Q_k}(\lambda)=(-\lambda)^L.$$

2) k = m: The characteristic polynomial of the matrix [TeX:] $$G_m$$ is defined as follows:

(100)
[TeX:] $$\chi_{Q_k}=\operatorname{det}\left(Q_m-\lambda I\right).$$

The matrix [TeX:] $$G_m-\lambda I$$ is a lower triangular matrix. Therefore, its determinant will be simply equal to:

(101)
[TeX:] $$\chi_{Q_m}(\lambda)=(-\lambda)^{L-l_m}\left(\rho_m-\lambda\right)^{l_m} .$$

For k ≠ m, [TeX:] $$G_k$$ has only 0 as eigenvalue.

For k = m, [TeX:] $$\chi_{Q_m}(\lambda)=0 \Leftrightarrow \lambda=0 \text { or } \lambda=\rho_m.$$ Hence, [TeX:] $$Q_m$$ has two eigenvalues: 0 and [TeX:] $$\rho_m$$ which are strictly less than 1. Consequently, in both cases, whether k ≠ m or k = m, the norms of all eigenvalues of [TeX:] $$Q_k$$ are strictly less than 1. Hence, for [TeX:] $$\lambda \in \operatorname{Sp}(\boldsymbol{Q}) \Rightarrow|\lambda|\lt 1.$$

APPENDIX N

PROOF OF LEMMA 3

We take [TeX:] $$0\lt \epsilon\lt \mu, \boldsymbol{z}(t)$$ converges to [TeX:] $$\boldsymbol{z}^*,$$ i.e., there exists [TeX:] $$T_0$$ such that for all [TeX:] $$t \geq T_0,\left\|\boldsymbol{z}(t)-\boldsymbol{z}^*\right\| \leq \epsilon.$$ Hence:

(102)
[TeX:] $$\begin{aligned} & P_x\left(\sup _{T_0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}^*\right\| \geq \mu\right) \\ & \quad \leq P_x\left(\sup _{T_0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}(t)\right\|+\left\|\boldsymbol{z}(t)-\boldsymbol{z}^*\right\| \geq \mu\right) \\ & \quad \leq P_x\left(\sup _{T_0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}(t)\right\| \geq \mu-\epsilon\right) \\ & \quad \leq P_x\left(\sup _{0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}(t)\right\| \geq \mu-\epsilon\right) . \end{aligned}$$

Using Proposition 10, there exists [TeX:] $$s_1 \text{ and } s_2$$ such that:

(103)
[TeX:] $$P_x\left(\sup _{0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}(t)\right\| \geq \mu-\epsilon\right) \leq s_1 \exp \left(-N s_2\right) .$$

Therefore:

(104)
[TeX:] $$P_x\left(\sup _{T_0 \leq t\lt T}\left\|\boldsymbol{Z}^N(t)-\boldsymbol{z}^*\right\| \geq \mu\right) \leq s_1 \exp \left(-N s_2\right) .$$

TABLE II
THE EXPRESSIONS OF THE MATRIX [TeX:] $$G_k$$ FOR k ≠ m.

APPENDIX O

PROOF OF PROPOSITION 11

We recall that [TeX:] $$\boldsymbol{Z}^N(t)$$ represents the proportion vector at time t under Whittle’s index policy.

Replacing [TeX:] $$C^{R P, N}$$ by its expression given in Section VI and knowing that [TeX:] $$z_i^{k, *}=\gamma_k u_k^{l_k}(i)$$ for k ≠ m and [TeX:] $$z_i^{m, *}=\gamma_m u_m^*(i)=\theta^* \gamma_m u_m^{l_m}(i)+\left(1-\theta^*\right) \gamma_m u_m^{l_m-1}(i)$$ (by definition of [TeX:] $$\boldsymbol{z}^*),$$ then the difference between [TeX:] $$C_T^N(\boldsymbol{x}) \text { and } C^{R P, N}$$ can be expressed as:

(105)
[TeX:] $$\begin{aligned} C_T^N(\boldsymbol{x})-C^{R P, N}= & \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k Z_i^{k, N}(t) i N \mid \boldsymbol{x}\right] \\ & \left.-\frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} i N\right] \right\rvert\,. \end{aligned}$$

We divide all by N

(106)
[TeX:] $$\begin{aligned} \frac{C_T^N(\boldsymbol{x})}{N} & -\frac{C^{R P, N}}{N} \\ = & \left|\frac{1}{T} \sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L \mathbb{E}\left(a_k Z_i^{k, N}(t) d(i)\right)-a_k z_i^{k, *} d(i)\right| \\ \leq & \left|\frac{1}{T} \sum_{t=0}^{T_0-1} \sum_{k=1}^K \sum_{i=0}^L \mathbb{E}\left(a_k Z_i^{k, N}(t) d(i)\right)-a_k z_i^{k, *} d(i)\right| \\ & +\left|\frac{1}{T} \sum_{t=T_0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L \mathbb{E}\left(a_k Z_i^{k, N}(t) d(i)\right)-a_k z_i^{k, *} d(i)\right| \\ \leq & \frac{T_0\left(L+C_d\right)(L+1)}{T} \sum_{k=1}^K a_k \gamma_k \\ & +\left|\frac{1}{T} \sum_{t=T_0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L \mathbb{E}\left(a_k Z_i^{k, N}(t) d(i)\right)-a_k z_i^{k, *} d(i)\right| . \end{aligned}$$

We have that the function [TeX:] $$f: \boldsymbol{z} \rightarrow \sum_{k=1}^K \sum_{i=0}^L a_k z_i^k d(i)$$ is lipchitz and continuous, then for an arbitrary small [TeX:] $$\epsilon$$, there exists μ such that if [TeX:] $$|| \boldsymbol{z}-\boldsymbol{z}^* \|\lt \mu, \text { then }\left|f(\boldsymbol{z})-f\left(\boldsymbol{z}^*\right)\right|\lt \epsilon.$$,

We denote [TeX:] $$Y_N$$ the event [TeX:] $$\sup _{T_0 \leq t\lt T}\left\|Z^N(t)-\boldsymbol{z}^*\right\| \geq \mu,$$ we proceed to bound the second term:

(107)
[TeX:] $$\begin{aligned} & \left|\frac{1}{T} \sum_{t=T_0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L E\left(a_k Z_i^{k, N}(t) d(i)\right)-a_k z_i^{k, *} d(i)\right| \\ & \leq P_x\left(Y_N\right) \frac{1}{T} \sum_{t=T_0}^{T-1} \mathbb{E}\left[\mid \sum_{k=1}^K \sum_{i=0}^L\left(a_k Z_i^{k, N}(t) d(i)\right)\right. \\ & \left.\quad-a_k z_i^{k, *} d(i)|| Y_N \mid\right] \\ & +\left(1-P_x\left(Y_N\right)\right) \frac{1}{T} \mathbb{E}\left[\mid \sum_{t=T_0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L\left(a_k Z_i^{k, N}(t) d(i)\right)\right. \\ & \left.\quad-a_k z_i^{k, *} d(i)|| \overline{Y_N}\right] \\ & \leq \frac{\left(T-T_0\right)\left(L+C_d\right)(L+1)}{T} \sum_{k=1}^K a_k \gamma_k P_x\left(Y_N\right) \\ & \quad+\left(1-P_x\left(Y_N\right)\right) \epsilon. \end{aligned}$$

where the above inequality comes from the fact that [TeX:] $$\left|a_k Z_i^{k, N}(t) d(i)-a_k z_i^{k, *} d(i)\right| \leq 2 \gamma_k a_k d(i).$$ According to Lemma 3, we have [TeX:] $$\lim _{N \rightarrow \infty} P_x\left(Y_N\right)=0,$$ then:

(108)
[TeX:] $$\begin{aligned} & \lim _{N \rightarrow \infty} \left\lvert\, \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k Z_i^{k, N}(t) d(i) N \mid \boldsymbol{x}\right]\right. \\ & \left.-\frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} d(i) N\right] \right\rvert\, \\ & \leq \frac{T_0\left(L+C_d\right)(L+1)}{T} \sum_{k=1}^K a_k \gamma_k+\epsilon . \end{aligned}$$

This inequality is true [TeX:] $$\forall \epsilon\gt 0,$$ then:

(109)
[TeX:] $$\begin{gathered} \lim _{N \rightarrow \infty} \left\lvert\, \frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k Z_i^{k, N}(t) d(i) N \mid \boldsymbol{x}\right]\right. \\ \left.-\frac{1}{T} \mathbb{E}\left[\sum_{t=0}^{T-1} \sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} d(i) N\right] \right\rvert\, \\ \leq \frac{T_0\left(L+C_d\right)(L+1)}{T} \sum_{k=1}^K a_k \gamma_k . \end{gathered}$$

Finally we have:

(110)
[TeX:] $$\lim _{T \rightarrow \infty} \lim _{N \rightarrow \infty} \frac{C_T^N(\boldsymbol{x})}{N}-\frac{C^{R P, N}}{N}=0 .$$

APPENDIX P

PROOF OF LEMMA 4

We consider any initial state [TeX:] $$\left(\boldsymbol{z}^1, \boldsymbol{z}^2, \cdots, \boldsymbol{z}^K\right),$$ and we consider only the following possible event (that arises with strictly positive probability): whatever the transmission decision taken, there is no arrivals for all classes up to time [TeX:] $$T=1 / \alpha$$ [TeX:] $$\left(A_k(t)=0 \text { from } t=0 \text { up till } T=1 / \alpha \text { for all } 1 \leq k \leq K\right) \text {. }$$

To that extent, we show that at time T, we reach the state [TeX:] $$\boldsymbol{z}_0.$$ For that purpose, we divide the queues into [TeX:] $$\frac{1}{\alpha}$$ groups denoted by [TeX:] $$G_1, \cdots, G_\alpha$$ such that [TeX:] $$G_k$$ contains a proportion α of queues with the highest Whittle’s indices among all queues of the system excluding those of the groups [TeX:] $$G_1, G_2, \cdots, G_{k-1}$$ at time t = 0. Based on this, at time slot t = 0, the queues in [TeX:] $$G_1$$ will be scheduled and will transit to state 0 as the number of arrival packets is equal to 0. According to the expressions given in Proposition 2, the Whittle’s index of state 0 is equal to 0 whatever the value of the class. While according to the same Proposition, the Whittle’s index of state n strictly higher than 0, is strictly greater than 0 for any class k. Therefore, regardless of the class, the Whittle’s index of state n strictly higher than 0 is greater than that of 0. Bearing that in mind, at time slot t = 1, the queues in [TeX:] $$G_2$$ at state different than 0 have the highest Whittle’s indices among all system’s queues. Therefore, these aforementioned queues will be scheduled, and subsequently, all queues in [TeX:] $$G_2$$ will be at state 0. In this way, at time slot [TeX:] $$1 / \alpha$$, we get all the queues of the system in state 0. Consequently, at time [TeX:] $$T=1 / \alpha$$, we attain the desired state which is [TeX:] $$\boldsymbol{z}_0.$$ That concludes the proof.

APPENDIX Q

PROOF OF THEOREM 3

(111)
[TeX:] $$\begin{aligned} & \lim _{T \rightarrow \infty} \frac{C_T^N(\boldsymbol{x})}{N}-\frac{C^{R P, N}}{N} \\ & =\sum_{k=1}^K \sum_{i=0}^L a_k \mathbb{E}\left[Z_i^{k, N}(\infty)\right] d(i)-\sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} d(i) \end{aligned} .$$

We have the function [TeX:] $$f: \quad \boldsymbol{z} \rightarrow \sum_{k=1}^K \sum_{i=0}^L a_k z_i^k d(i)$$ is lipchitz and continuous, then for an arbitrary small [TeX:] $$\epsilon,$$ there exists μ such that if [TeX:] $$\left\|\boldsymbol{z}-\boldsymbol{z}^*\right\|\lt \mu, \text { then }\left|f(\boldsymbol{z})-f\left(\boldsymbol{z}^*\right)\right|\lt \epsilon.$$

We denote [TeX:] $$U_N$$ the event [TeX:] $$\sup \left\|Z^N(\infty)-z^*\right\| \geq \mu,$$ then :

(112)
[TeX:] $$\begin{aligned} & \left|\sum_{k=1}^K \sum_{i=0}^L a_k \mathbb{E}\left[Z_i^{k, N}(\infty)\right] d(i)-\sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} d(i)\right| \\ & \leq P\left(U_N\right) \mathbb{E}\left[\mid \sum_{k=1}^K \sum_{i=0}^L\left(a_k Z_i^{k, N}(\infty) d(i)\right)-a_k z_i^{k, *} d(i) \| U_N\right] \\ & +\left(1-P\left(U_N\right)\right) \mathbb{E}\left[\mid \sum_{k=1}^K \sum_{i=0}^L\left(a_k Z_i^{k, N}(\infty) d(i)\right)\right. \\ & \left.-a_k z_i^{k, *} d(i) \| \overline{U_N}\right] \\ & \leq\left(L+C_d\right)(L+1) \sum_{k=1}^K a_k \gamma_k P\left(U_N\right)+\left(1-P\left(U_N\right)\right) \epsilon. \\ & \end{aligned}$$

According to Lemma 6, we have [TeX:] $$\lim _{N \rightarrow \infty} P\left(U_N\right)=0,$$ then:

(113)
[TeX:] $$\lim _{N \rightarrow \infty}\left|\sum_{k=1}^K \sum_{i=0}^L a_k \mathbb{E}\left[Z_i^{k, N}(\infty)\right] d(i)-\sum_{k=1}^K \sum_{i=0}^L a_k z_i^{k, *} d(i)\right| \leq \epsilon.$$

This is true for any [TeX:] $$\epsilon.$$ Finally we have:

(114)
[TeX:] $$\lim _{N \rightarrow \infty}\left|\lim _{T \rightarrow \infty} \frac{C_T^N(\boldsymbol{x})}{N}-\frac{C^{R P, N}}{N}\right|=0 .$$

That completes the proof.

Biography

Saad Kriouile

Saad Kriouile received his M.Sc and Ph.D. degrees both in Telecommunications from CentraleSupelec, Gif-sur-Yvette, France, in 2017 and 2021, respectively. His Ph.D. thesis was on the design of asymptotically optimal scheduling in large networks and has been supported by the TCL 5G chair at CentraleSupelec. He is currently a Postdoctoral Researcher at the Laboratory of Signals and Systems (L2S), CentraleSupelec and University of Paris-Saclay. His research interest includes age of information, stochastic resource optimization and design of optimal scheduling in wireless and queuing networks.

Biography

Mohamad Assaad

Mohamad Assaad received the M.Sc and Ph.D. degrees, both in telecommunications, from Telecom ParisTech, Paris, France, in 2002 and 2006, respectively. Since 2006, he has been with the Telecommunications Department at CentraleSupelec, where he is currently a Professor. He is also a Researcher at the Laboratoire des Signaux et Systemes (L2S, CNRS) and holds the 5G Chair. He has co-authored 1 book and more than 120 publications in journals and conference proceedings and has served as TPC member or TPC Co-chair for top-tier international conferences, including TPC Co-chair for IEEE WCNC’21, IEEE Globecom’20 Mobile and Wireless networks Symposium Co-chair, etc. He is an Editor for the IEEE Wireless Communications Letters and the Journal of Communications and Information Networks. He served also as a Guest Coeditor for a special issue of the IEEE transactions on Network Science and Engineering. He has given in the past successful tutorials on several topics related to 5G systems, and age of information at various conferences including IEEE ISWCS’15, IEEE WCNC’16, and IEEE ICC’21 conferences. His research interests include 5G and beyond systems, fundamental networking aspects of wireless systems, Age of Information, resource optimization and machine learning in wireless networks.

Biography

Maialen Larranaga

Maialen Larranaga received the master’s degree in Mathematics from the University of the Basque Country, Leioa, Spain, in September 2012, and the Ph.D. degree in Computer Science from the Institute National Polytechnique de Toulouse, University of the Basque Country, in September 2015. The thesis was carried out in CNRS, LAAS, and INPENSEEIHT under the supervision of U. Ayesta, CNRS, LAAS, Toulouse, France and I. M. Verloop, CNRS, IRIT, Toulouse, with a Foundation AIRBUS Group scholarship. She was a Postdoctoral Fellow with CentraleSupelec, Gif-sur-Yvette, France, from October 2015 to September 2017. She later joined ASML as a Machine Learning Engineer and is currently a Data Scientist in Capgemini Engineering. Dr. Larranaga was a recipient of the Ph.D. thesis with the Prix Paul Sabatier in 2016, the Takacs Runner-Up Award in 2016, and the Prix Leopold Escande in 2015.

References

  • 1 S. Kriouile, M. Larranaga, and M. Assaad, "Whittle index policy for multichannel scheduling in queueing systems," in Proc. IEEE ISIT. 2019.doi:[[[10.1109/ISIT.2019.8849774]]]
  • 2 M. Deghel, M. Assaad, M. Debbah, and A. Ephremides, "Queueing stability and CSI probing of a TDD wireless network with interference alignment," IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 547-576, Sep. 2018.doi:[[[10.48550/arXiv.1605.00873]]]
  • 3 A. Destounis, M. Assaad, M. Debbah, and B. Sayadi, "Traffic-aware training and scheduling for the 2-user MISO broadcast channel," in Proc. IEEE ISIT, 2014.doi:[[[10.1109/ISIT.2014.6875058]]]
  • 4 A. Destounis, M. Assaad, M. Debbah, and B. Sayadi, "Traffic-aware training and scheduling for miso wireless downlink systems," IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2574-2599, Mar. 2015.doi:[[[10.1109/TIT.2015.2411607]]]
  • 5 L. Tassiulas and A. Ephremides, "Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks," IEEE Trans. Automatic Control, vol 37, no. 12, pp. 2130-2132, Dec. 1992.doi:[[[10.1109/9.182479]]]
  • 6 L. Tassiulas and A. Ephremides, "Dynamic server allocation to parallel queues with randomly varying connectivity," IEEE Trans. Inf. Theory, vol. 39, no. 2, pp. 466-478, Mar. 1993.doi:[[[10.1109/18.212277]]]
  • 7 M. J. Neely, "Optimal energy and delay tradeoffs for multiuser wireless downlinks," IEEE Trans. Inf. Theory, vol. 53, no. 9, pp. 3095-3113, Aug. 2007.doi:[[[10.1109/TIT.2007.903141]]]
  • 8 L. Georgiadis, M. J. Neely, and L. Tassiulas, Resource allocation and cross-layer control in wireless networks, Now Publishers Inc., 2006.doi:[[[10.1561/1300000001]]]
  • 9 C. H. Papadimitriou and J. N. Tsitsiklis, "The complexity of optimal queuing network control," in Proc. IEEE Structure Complexity Theory, 1994.doi:[[[10.1109/SCT.1994.315792]]]
  • 10 K. Liu and Q. Zhao, "Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access," IEEE Trans. Inf. Theory, vol. 56, no. 11, pp. 5547-5567, Oct. 2010.doi:[[[10.48550/arXiv.0810.4658]]]
  • 11 P. Ansell, K. D. Glazebrook, J. Ni˜ no-Mora, and M. O’Keeffe, "Whittle’s index policy for a multi-class queueing system with convex holding costs," Math. Methods Operations Res., vol. 57, no. 1, pp. 21-39, Apr. 2003.doi:[[[10.1007/s001860200257]]]
  • 12 C. Buyukkoc, P. Variaya, and J. Walrand, "c rule revisited." Adv. Appl. Prob., vol. 17, no. 1, pp. 237-238, 1985.custom:[[[https://www.semanticscholar.org/paper/The-c%CE%BC-rule-revisited-Buyukkoc-Varaiya/e40993db8b3e72fdd59d791a76121a230ab9913f]]]
  • 13 M. Larra˜ naga, "Dynamic control of stochastic and fluid resource-sharing systems," Ph.D. dissertation, 2015.custom:[[[https://www.researchgate.net/publication/286643812_Dynamic_control_of_stochastic_and_fluid_resource-sharing_systems]]]
  • 14 Y . Cui, V . K. Lau, R. Wang, H. Huang, and S. Zhang, "A survey on delay-aware resource control for wireless systems—large deviation theory, stochastic lyapunov drift, and distributed stochastic learning," IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1677-1701, Feb. 2012.doi:[[[10.1109/TIT.2011.2178150]]]
  • 15 I. Bettesh and S. Shamai, "Optimal power and rate control for minimal average delay: The single-user case," IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 4115-4141, Aug. 2006.doi:[[[10.1109/TIT.2006.880011]]]
  • 16 R. Wang and V . K. Lau, "Delay-aware two-hop cooperative relay communications via approximate mdp and stochastic learning," IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7645-7670, Aug. 2013.doi:[[[10.1109/TIT.2013.2279895]]]
  • 17 Y . Cui and V . K. Lau, "Distributive stochastic learning for delay-optimal ofdma power and subband allocation," IEEE Trans. Signal Process., vol. 58, no. 9, pp. 4848-4858, May 2010.doi:[[[10.1109/TSP.2010.2050062]]]
  • 18 P. Jacko, "Value of information in optimal flow-level scheduling of users with Markovian time-varying channels," Performance Evaluation, vol. 68, no. 11, pp. 1022-1036, Nov. 2011.doi:[[[10.1016/j.peva.2011.07.012]]]
  • 19 F. Cecchi and P. Jacko, "Nearly-optimal scheduling of users with Markovian time-varying transmission rates," Performance Evaluation, vol. 99, pp. 16-36, May 2016.doi:[[[10.1016/j.peva.2016.02.002]]]
  • 20 U. Ayesta, M. Erausquin, and P. Jacko, "A modeling framework for optimizing the flow-level scheduling with time-varying channels," Performance Evaluation, vol. 67, no. 11, pp. 1014-1029, 2010.doi:[[[10.1016/j.peva.2010.08.015]]]
  • 21 W. Ouyang, A. Eryilmaz, and N. B. Shroff, "Downlink scheduling over Markovian fading channels," IEEE/ACM Trans. Netw., vol. 24, no. 3, pp. 1801-1812, Jun. 2015.doi:[[[10.1109/TNET.2015.2438009]]]
  • 22 W. Ouyang, S. Murugesan, A. Eryilmaz, and N. B. Shroff, "Exploiting channel memory for joint estimation and scheduling in downlink networks," in Proc. IEEE INFOCOM, 2011.doi:[[[10.1109/INFCOM.2011.5935149]]]
  • 23 M. Larra˜ naga, M. Assaad, A. Destounis, and G. S. Paschos, "Asymptotically optimal pilot allocation over Markovian fading channels," IEEE Trans. Inf. Theory, vol. 64, no. 7, pp. 5395-5418, Nov. 2017.doi:[[[10.1109/TIT.2017.2772814]]]
  • 24 R. R. Weber and G. Weiss, "On an index policy for restless bandits," J. Appl. Prob., vol. 27, no. 3, pp. 637-648, 1990.doi:[[[10.2307/3214547]]]
  • 25 S. M. Ross, Introduction to stochastic dynamic programming, Academic press, 2014.custom:[[[-]]]
  • 26 M. L. Puterman, Markov decision processes: Discrete stochastic dynamic programming, John Wiley Sons, 2014.doi:[[[https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316887]]]
  • 27 A. Maatouk, S. Kriouile, M. Assad, and A. Ephremides, "On the optimality of the Whittle’s index policy for minimizing the age of information," IEEE Trans. Wireless Commun., vol. 20, no. 2, pp. 1263-1277, Oct. 2020.doi:[[[10.48550/arXiv.2001.03096]]]
  • 28 Y .-P. Hsu, "Age of information: Whittle index for scheduling stochastic arrivals," in Proc. IEEE ISIT, 2018.doi:[[[10.48550/arXiv.1801.03422]]]
  • 29 T. G. Kurtz, "Strong approximation theorems for density dependent Markov chains," Stochastic Processes Appl., vol. 6, no. 3, pp. 223-240, Feb. 1978.doi:[[[10.1016/0304-4149(78)90020-0]]]
  • 30 J. Gittins, K. Glazebrook, and R. Weber, Multi-armed bandit allocation indices. John Wiley Sons, 2011.doi:[[[https://onlinelibrary.wiley.com/doi/book/10.1002/9780470980033]]]
  • 31 K. P. Papadaki and W. B. Powell, "Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem," European J. Oper. Res., vol. 142, no. 1, pp. 108-127, Oct. 2002.doi:[[[10.1016/S0377-2217(01)00297-1]]]
  • 32 Y . Ruan, W. Wang, Z. Zhang, and V . K. Lau, "Delay-aware massive random access for machine-type communications via hierarchical stochastic learning," in Proc. IEEE ICC, 2017.doi:[[[10.1109/ICC.2017.7996795]]]
  • 33 P. Whittle, "Restless bandits: Activity allocation in a changing world," J. Appl. Prob., vol. 25, no. A, pp. 287-298, 1988.custom:[[[https://www.jstor.org/stable/3214163]]]

TABLE I

THE EXPRESSIONS OF THE MATRICES [TeX:] $$Q_k$$ FOR k ≠ m AND [TeX:] $$Q_m.$$
table1.png

TABLE II

THE EXPRESSIONS OF THE MATRIX [TeX:] $$G_k$$ FOR k ≠ m.
table2.png
System model.
Whittle index computation
[TeX:] $$y^n(W)$$ in function of W for different values of n (L = 4, R-1 = 7, and [TeX:] $$C_d=3$$): W(i) indicates the Whittle’s index at state i.
Hitting Time of [TeX:] $$\Omega_\epsilon\left(z^*\right)$$ in function of N: (a) [TeX:] $$Z^N(0)=x$$ and (b) [TeX:] $$Z^N(0)=y$$.
Performance evaluation of Whittle’s index policy for 2 classes.
Performance evaluation of Whittle’s index policy for 3 classes.
Evaluation of [TeX:] $$C_1^{\pi, N} \text { and } C_2^{\pi, N}$$ in function of N under policy Θ and Whittle’s index policy WIP.