** Optimizing Joint Probabilistic Caching and Channel Access for Clustered D2D Networks **

Ramy Amer , Mohamed Baza , Tara Salman , M. Majid Butt , Ahmad Alhindi and Nicola Marchetti

## Article Information

## Abstract

**Abstract:** Caching at mobile devices and leveraging device-todevice (D2D) communication are two promising approaches to support massive content delivery over wireless networks. Analysis of such D2D caching networks based on a physical interference model is usually carried out by assuming uniformly distributed devices. However, this approach does not capture the notion of device clustering. In this regard, this paper proposes a joint communication and caching optimization framework for clustered D2D networks. Devices are spatially distributed into disjoint clusters and are assumed to have a surplus memory that is utilized to proactively cache files, following a random probabilistic caching scheme. The cache offloading gain is maximized by jointly optimizing channel access and caching scheme. A closed-form caching solution is obtained and bisection search method is adopted to heuristically obtain the optimal channel access probability. Results show significant improvement in the offloading gain reaching up to 10% compared to the Zipf caching baseline.

**Keywords:** caching , channel access , d2d communication , offloading gain

## I. INTRODUCTION

CACHING at mobile devices significantly improves system performance by facilitating device-to-device (D2D) communications, which enhances the spectrum efficiency and alleviate the heavy burden on backhaul links [1]–[3]. There are two main approaches for content placement in the literature, deterministic and probabilistic. For deterministic placement, files are cached and optimized for specific networks in a deterministic manner [3]–[5]. However, in practice, the wireless channels and the geographic distribution of devices are time-variant. This triggers the optimal content placement strategy to be fre quently updated, which makes the content placement quite complex. To cope with this problem, probabilistic content placement is proposed whereby each device randomly caches a subset of content with a certain caching probability in stochastic networks [6], [7]. In this paper, we focus on the probabilistic caching model. In essence, this model is shown to be powerful and tractable for the analysis of random networks and possibly yields a convex content caching problem, which can be effectively solved [6], [7].

Modeling of wireless caching networks also follows two main directions in the current state-of-art. The first line of work focuses on the fundamental scaling results by assuming a simple protocol channel model [3]–[5], known as the protocol model. This model assumes that two devices can always communicate if they are within a certain distance. The second line of work, which is similar to the one adopted in this paper, considers a more realistic model for the underlying physical and medium access control (MAC) layers [8]–[16]. This is commonly defined as the physical interference model. Driven by this, the physical interference model allows us to study joint caching and communications for D2D networks and design efficient channel access-aware caching scheme.

The analysis of wireless caching networks that underlies a physical interference model, is commonly conducted by means of stochastic point processes. For instance, modeling device locations as a Poisson point process (PPP) is a widely adopted approach in the wireless caching area [8]–[10], [17]. However, while the PPP model is tractable, a realistic model for D2D caching networks needs to capture the notion of clustering. In particular, in clustered D2D networks, each device has multiple proximate devices, where any of them can act as a serving device. Such deployments can be effectively characterized by cluster processes [18].

Performance of clustered D2D caching networks is studied in [12]–[16], [19], [20]. For instance, the authors in [12] discussed different strategies of content placement in a Poisson cluster process (PCP) deployment. Meanwhile, the authors extended their work in [13], to optimize the collective performance of all the devices in each cluster. Moreover, the authors in [14] proposed cooperation among the D2D transmitters and hybrid caching strategies to save the energy cost of content providers, where the location of these providers is modeled by a Gauss-Poisson process. In [15], we jointly optimized content caching and frequency partitioning between D2D cellular communications for clustered cache-enabled networks. Moreover, we studied the role of cooperative communication for clustered D2D networks in [16]. We particularly showed that cooperative communication becomes more appealing in denser D2D caching networks and adverse interference conditions. Meanwhile, in [20], we formu lated and solved the energy minimization problem for clustered D2D caching networks.

While the prior works in [12]–[16], [19], [20] studied cacheenabled D2D networks from various perspectives, the joint optimization of caching and channel access for clustered D2D networks has not been addressed yet in the literature. Such a study is vital for proper quantifying and optimizing the achievable performance of D2D cache-enabled networks under practical MAC scheduling protocols.

Compared with this prior art, the main contributions of this paper are as follows:

We study the content placement and delivery for a network wherein cache-enabled devices are spatially distributed into disjoint clusters. We conduct a performance analysis and joint optimization of channel access and probabilistic content placement with the aim to maximize the cache offloading gain.

We characterize the optimal content placement as a function of the system parameters, namely, density of clusters, displacement distance between devices, and required signal-to-interference ratio (SIR) threshold. We then propose a heuristic approach to obtain the optimal channel access probability.

Our results reveal that the optimal caching scheme heavily depends on the channel access probability and the geometry of the network. Overall, joint optimization of content placement and communication, e.g., channel access, is shown to be vital to enhance the performance of wireless caching networks.

The rest of this paper is organized as follows. Section II and Section III introduce the system model and the rate coverage analysis, respectively. The offloading gain maximization problem is discussed in Section IV. Numerical results are presented in Section V and conclusions are drawn in Section VI.

## II. SYSTEM MODEL

##### A. System Setup

We model the location of mobile devices with a Thomas cluster process (TCP). The use of TCP allows us to factor in the notion of clustering for D2D caching networks, which is commonly ignored in the literature. The TCP is composed of the parent points, which are drawn from a PPP [TeX:] $$\Phi_{p}$$ with density [TeX:] $$\lambda_{p},$$ and the daughter points that are drawn from a Gaussian PPP around each parent point [18]. In particular, the daughter points are normally scattered with variance [TeX:] $$\sigma^{2} \in \mathbb{R}$$ around each parent point. The parent points and offspring are referred to as cluster centers and cluster members, respectively. By the TCP definition, the number of devices per cluster is a Poisson random variable (RV) with mean [TeX:] $$1 \bar{n}.$$ Therefore, the density function of a cluster member location relative to its cluster center is

##### (1)

[TeX:] $$f_{Y}(y)=\frac{1}{2 \pi \sigma^{2}} \exp \left(-\frac{\|y\|^{2}}{2 \sigma^{2}}\right), \quad y \in \mathbb{R}^{2},$$where [TeX:] $$\|.\|$$ is the Euclidean norm. The intensity function of a cluster is given by [TeX:] $$\lambda_{c}(y)=\bar{n} /\left(2 \pi \sigma^{2}\right) \exp \left(-\frac{\|y\|^{2}}{2 \sigma^{2}}\right),$$ and therefore, the intensity of the entire process is given by [TeX:] $$\lambda=\bar{n} \lambda_{p}.$$

We assume that the D2D communication is operating as outof- band D2D under flat Rayleigh fading channels. D2D communication is enabled within each cluster to deliver popular content. It is assumed that the devices adopt a slotted-ALOHA medium access protocol, where each transmitter during each time slot, independently and randomly accesses the channel with the same probability q. One can alternatively assume that each device makes a coin flip at each time about whether or not it accesses a shared-channel. This allows us to define a Bernoulli process [TeX:] $$N_{y}$$ with the probability that a device located at y accesses a channel being [TeX:] $$\mathbb{P}\left(N_{y}\right)=q.$$ The key advantage of adopting slotted-ALOHA is that it is a simple yet fundamental MAC protocol, where there is no need for a central controller to schedule the users’ transmissions. Moreover, despite the vast amount of existing studies on MAC protocols, only variations of ALOHA and CSMA are still used in the majority of technologies adopted in the Internet of Things [21]. According to this access model, multiple active D2D links might coexist within a cluster. Therefore, q is a design parameter that directly controls intra- as well as inter-cluster interference, as described later.

If a requesting device caches the desired content, the device directly retrieves the content. However, if the content is not locally cached, it can be downloaded from a randomly selected neighboring device that caches the file within the same cluster, henceforth called catering device. This catering device is, in turn, admitted to access the channel according to the proposed slotted-ALOHA protocol. Finally, the device attaches to the nearest base station (BS) as a last resort to download the content, in the case it is not cached within the device cluster. Since there are memory and battery consumption costs borne by a catering device, the geographically closest device may not want to participate in the content caching and/or delivery. Hence, randomizing the catering device reflects the possibility of being served by a distant device that is willing to participate in the content delivery, which is not necessarily the nearest one. Note that this assumption is commonly adopted in the literature [12], [14].

##### B. Content Popularity and Caching

We assume that each device has a surplus memory of size M designated for caching files. The total number of files is [TeX:] $$N_{f}>M,$$ and the set (library) of content indices is denoted as [TeX:] $$\mathcal{F}=\left\{1,2, \cdots, N_{f}\right\}.$$ These files represent the content catalog that all devices in a cluster may request, which are indexed in a descending order of popularity. The probability that the ith file is requested follows a Zipf’s distribution given by [22],

where [TeX:] $$\beta$$ is a parameter that reflects how skewed the popularity distribution is. For example, if [TeX:] $$\beta=0,$$ the popularity of the files has a uniform distribution. Increasing [TeX:] $$\beta$$ increases the disparity among the files popularity such that lower indexed files have higher popularity. By definition, [TeX:] $$\sum_{i=1}^{N_{f}} p_{i}=1.$$ We use the Zipf’s distribution to model the popularity of files per cluster.

We adopt a random content placement where each device independently selects a file to cache according to a specific prob ability function [TeX:] $$b=\left\{b_{1}, b_{2}, \cdots, b_{N_{e}}\right\}, \text { where } b_{i}$$ is the probability that a device caches the ith file, [TeX:] $$0 \leq b_{i} \leq 1$$ for all [TeX:] $$i=\left\{1, \cdots, N_{f}\right\}.$$ To avoid duplicate caching of the same content within the memory of the same device, we follow a proba Pbilistic caching approach proposed in [7], which implies that [TeX:] $$\sum_{i=1}^{N_{f}} b_{i}=M .$$ ^{1}

^{1} It is clear that the benefits of content caching at devices is prominent only if these devices have sufficient memory to store files of interest. The availability of unused memory on these devices can not be always maintained. In other words, it might happen that the devices run out of sufficient memory to cache popular files. The analysis of device caching networks with such variations of unused memory to store popular files is an important extension for our future work.

Next, we proceed with the rate coverage analysis to obtain the offloading gain, which is a key performance metric for D2D caching networks [9]. Particularly, the offloading gain is defined as the probability of obtaining a requested file from the local cluster, either via self-cache or from a neighboring device in the same cluster, with a received SIR higher than a required threshold [TeX:] $$\vartheta$$

## III. RATE COVERAGE ANALYSIS

We conduct the next analysis for a cluster whose center is assumed at [TeX:] $$x_{0} \in \Phi_{p},$$ referred to as representative cluster. The device requesting a content in this cluster, henceforth called typical device, is located at the origin. We denote the location of the catering device by [TeX:] $$y_{0}$$ relative to [TeX:] $$x_{0},$$ where [TeX:] $$\left\{x_{0}, y_{0}\right\} \in \mathbb{R}^{\not}.$$ The distance between the typical and catering devices is denoted as [TeX:] $$r=\left\|x_{0}+y_{0}\right\|,$$ which is a realization of a RV R whose distribution is described later. Having explained the channel access and the random selection of catering devices, the offloading gain can be expressed as

##### (3)

[TeX:] $$\begin{aligned} \mathbb{P}_{o}(q, \boldsymbol{b})=& \sum_{i=1}^{N_{f}} p_{i} b_{i}+p_{i}\left(1-b_{i}\right)\left(1-e^{-b_{i} \bar{n}}\right) \\ & \times \underbrace{\int_{r=0}^{\infty} f_{R}(r) \mathbb{P}\left(\mathrm{SIR}_{\mid r}>\vartheta\right) \mathrm{dr}}_{\Upsilon}, \end{aligned}$$where [TeX:] $$\mathrm{SIR}_{\mid \mathrm{r}}$$ is the received SIR at the typical device when downloading a content from a catering device r apart from the origin, and [TeX:] $$\Upsilon$$ represents the rate coverage probability. The first term in (3) is the probability of requesting a locally cached file (self-cache). The second term is the probability that a requested file i is cached among at least one cluster member and being downloadable with an SIR greater than [TeX:] $$\vartheta,$$ given that it was not self-cached. More precisely, since the number of devices per cluster has a Poisson distribution, the probability that there are k devices per cluster is equal to [TeX:] $$\bar{n}^{k} e^{-\bar{n}} / k !.$$ Accordingly, the probability that there are k devices caching content i is [TeX:] $$\left(b_{i} \bar{n}\right)^{k} e^{-b_{i} \bar{n}} / k !.$$ Hence, the probability that at least one device caches content i is [TeX:] $$1-e^{-b_{i} \bar{n}} .$$

For the serving distance distribution [TeX:] $$f_{R}(r),$$ since both the typical device and catering device have their locations drawn from a normal distribution with variance [TeX:] $$\sigma^{2},$$ then by definition, the serving distance has a Rayleigh distribution of scale parameter [TeX:] $$\sqrt{2} \sigma, \text { i.e., } f_{R}(r)=r /\left(2 \sigma^{2}\right) \mathrm{e}^{\frac{-r^{2}}{4 \sigma^{2}}}.$$ It is worth noting that the serving distance is independent of the caching probability. To clarify, from the thinning theorem [18], the set of devices caching content i in a given cluster forms a Gaussian PPP [TeX:] $$\Phi_{c i}$$ whose intensity is [TeX:] $$\lambda_{c i}=b_{i} \lambda_{c}(y).$$ The probability distribution function (PDF) of the distance between a randomly selected caching device from [TeX:] $$\Phi_{c i}$$ and the typical device is [TeX:] $$f_{R}(r),$$ which is again independent of [TeX:] $$b_{i}.$$

The received power at the typical device from a catering device located at [TeX:] $$y_{0}$$ relative to the cluster center is given by

where [TeX:] $$P_{d}$$ denotes the D2D transmission power, [TeX:] $$g_{0} \sim \exp (1)$$ is the complex Gaussian fading channel coefficient, and [TeX:] $$\alpha>2$$ is the path loss exponent. Under this setup, the typical device sees two types of interference, namely, the intra- and inter-cluster interference. We first describe the inter-cluster interference, then the intra-cluster interference is characterized. The set of active devices in any remote cluster is denoted as [TeX:] $$\mathcal{B}^{\mathrm{II}},$$ where q refers to the access probability. Similarly, the set of active devices in the local cluster is denoted as [TeX:] $$\mathcal{A}^{\mathrm{II}}.$$ The received interference at the typical device from simultaneously active D2D transmitters within the remote clusters is

where [TeX:] $$\Phi_{p}^{!}=\Phi_{p} \backslash x_{0}$$ for ease of notation, y is the marginal distance between a potential interfering device and its cluster center at [TeX:] $$x \in \Phi_{p}, u=\|x+y\|$$ s a realization of a RV U that models the inter-cluster interfering distance, [TeX:] $$g_{y_{x}} \sim \exp (1),$$ and [TeX:] $$g_{u}=g_{y_{x}}.$$ The intra-cluster interference is then given by

where y is the marginal distance between the intra-cluster interfering devices and the cluster center at [TeX:] $$x_{0} \in \Phi_{p}, h=\left\|x_{0}+y\right\|$$ is a realization of a RV H, which models the intra-cluster interfering distance, [TeX:] $$g_{y_{x_{0}}} \sim \exp (1), \text { and } g_{h}=g_{y_{z_{0}}}.$$ From the thinning theorem [18], the set of active transmitters based on the slotted-ALOHA medium access forms a Gaussian [TeX:] $$\mathrm{PPP} \Phi_{c q}$$ whose intensity is given by

Assuming that the thermal noise is neglected as compared to the aggregate interference, the received SIR at the typical device can be written as

##### (5)

[TeX:] $$\begin{aligned} \mathrm{SIR}_{\mid \mathrm{r}} &=\mathbf{1}\left\{N_{r}=1\right\} \frac{P}{I_{\Phi_{p} !}+I_{\Phi_{c}}} \\ &=\mathbf{1}\left\{N_{r}=1\right\} \frac{P_{d} g_{0} r^{-\alpha}}{I_{\Phi_{p}} !}+I_{\Phi_{c}}, \end{aligned}$$where [TeX:] $$1\{.\}$$ is the indicator function, and, for ease of exposition, [TeX:] $$N_{r}=N_{y 0}$$ is a Bernoulli RV that takes the value one with probability q. Thus, the event [TeX:] $$\left\{N_{r}=1\right\}$$ captures the incident when the serving device is admitted to access the channel. Then, the probability that the received SIR is higher than the required threshold [TeX:] $$\vartheta$$ is derived as follows:

##### (6)

[TeX:] $$\begin{aligned} \Upsilon_{\mid r} &=\mathbb{P}\left(\mathrm{SIR}_{\mid \mathrm{r}}>\vartheta\right) \\ &=\mathbb{P}\left(\mathbf{1}\left\{\mathrm{N}_{\mathrm{r}}=1\right\} \frac{\mathrm{P}_{\mathrm{d}} \mathrm{g}_{0} \mathrm{r}^{-\alpha}}{\mathrm{I}_{\Phi_{\mathrm{P}} !}+\mathrm{I}_{\Phi_{\mathrm{c}}}}>\vartheta\right) \\ \stackrel{\stackrel{(a)}{=}} q \mathbb{P}\left(\frac{\mathrm{P}_{\mathrm{d}} \mathrm{g}_{0} \mathrm{r}^{-\alpha}}{I_{\Phi_{\mathrm{p}}}+\mathrm{I}_{\Phi_{\mathrm{c}}}}>\vartheta\right), \end{aligned}$$where (a) follows from the assumption of a Bernoulli’s RV with mean q. Rearranging the right-hand side, we get

##### (7)

[TeX:] $$\begin{aligned} &\Upsilon_{\mid r} \stackrel{(b)}{=} q \mathbb{E}_{\mathrm{I}_{\Phi_{\mathrm{p}}}, \mathrm{I}_{\mathrm{\Phi}_{\mathrm{c}}}}\left[\exp \left(\frac{-\vartheta r^{\alpha}}{P_{d}}\left[I_{\Phi_{p}}+I_{\Phi_{c}}\right]\right)\right]\\ &\stackrel{(c)}{=} q \mathscr{L}_{I_{\Phi_{p}}^{1}}(s) \mathscr{L}_{I_{\Phi_{c}}}(s) \end{aligned}$$where (b) follows from the assumption [TeX:] $$g_{0} \sim \mathcal{C} \mathcal{N}(0,1),$$ and (c) follows from the independence of the intra- and inter-cluster interference and calculating the Laplace transform of them, with [TeX:] $$s=\vartheta r^{\alpha} / P_{d}.$$ The classical tradeoff between frequency reuse and higher interference power is depicted in (7). In other words, increasing the access probability q allows more opportunities to access the channel, however, this channel access would be accompanied with higher interference power.

Next, we first derive the Laplace transform of interference to obtain the rate coverage probability [TeX:] $$\Upsilon.$$ Then, we formulate the offloading gain maximization problem.

**Lemma 1:** Laplace transform of the inter-cluster aggregate interference [TeX:] $$I_{\Phi_{P}^{!}}$$ is given by

##### (8)

[TeX:] $$\mathscr{L}_{I_{\Phi_{p} !}}(s) =\exp \left(-2 \pi \lambda_{p} \int_{v=0}^{\infty}\left(1-\mathrm{e}^{-q \bar{n} \varphi(s, v)}\right) v \mathrm{~d} \mathrm{v}\right),$$where [TeX:] $$s=\frac{\partial r^{\alpha}}{P_{d}}, \varphi(s, v)=\int_{u=0}^{\infty} \frac{s}{s+u^{\alpha}} f_{U}(u \mid v) \mathrm{du},$$ and [TeX:] $$f_{U}(u \mid v)=\operatorname{Rice}(\mathrm{u} \mid \mathrm{v}, \sigma)$$ represents Rice’s PDF of parameter [TeX:] $$\sigma, \text { and } v=\|x\|.$$

Proof: Please see Appendix A.

**Lemma 2:** Laplace transform of the intra-cluster aggregate interference [TeX:] $$I_{\Phi_{c}}$$ is approximated as

##### (9)

[TeX:] $$\mathscr{L}_{\mathscr{H}}() \approx \exp \left(-\int_{=}^{\infty} \frac{\alpha}{+\alpha} \mathscr{H}() \mathrm{dh}\right),$$where [TeX:] $$f_{H}(h)=\operatorname{Rayleigh}(\mathrm{h}, \sqrt{2} \sigma)$$ epresents Rayleigh’s PDF with a scale parameter [TeX:] $$\sqrt{2} \sigma \text {. }.$$

The proof of Lemma 2 proceeds in a similar way to the proof of Lemma 1, and the approximation follows from neglecting the correlation among intra-cluster serving distances, i.e., the common part [TeX:] $$x_{0} \text { in }\left\|x_{0}+y\right\|$$ with the detailed proof omitted.

To validate the approximation in Lemma 2, in Fig. 1, we plot the rate coverage probability [TeX:] $$\Upsilon,$$ computed from (3), against the displacement standard deviation [TeX:] $$\sigma.$$ Fig. 1 verifies that the adopted approximation is accurate. It is intuitive to see that the [TeX:] $$\Upsilon$$ decreases as both [TeX:] $$\sigma \text { and } \lambda_{p}$$ increase. This is attributed to the fact that the desired signal level increases as [TeX:] $$\sigma$$ decreases, meanwhile, the interference power increases with [TeX:] $$\lambda_{p} \text { and } \sigma.$$ This is

attributed to the higher density of clusters (larger [TeX:] $$\lambda_{p} \text { ) }$$ and statistically shorter distance between interfering devices and the typical device (larger [TeX:] $$\sigma \text { ). }$$

##### (10)

[TeX:] $$\begin{aligned} \mathbb{P}_{o}(q, \boldsymbol{b})=& \sum_{i=1}^{N_{f}} p_{i} b_{i}+p_{i}\left(1-b_{i}\right)\left(1-e^{-b_{1} \bar{n}}\right) \\ & \times \underbrace{\int_{r=0}^{\infty} \frac{r}{2 \sigma^{2}} e^{\frac{-r^{2}}{4 \sigma^{2}}} p \mathscr{L}_{I_{\Phi_{p}}^{!}}(s) \mathscr{L}_{I_{\Phi_{c}}}(s) \mathrm{dr}}_{\Upsilon}. \end{aligned}$$Having characterized the offloading gain, we next formulate the joint channel access and caching optimization problem.

## IV. MAXIMIZING OFFLOADING GAIN

The offloading gain maximization problem is formulated as

where (12) is the device cache size constraint. Since the offloading gain depends on the caching probability b and access probability q, and since q exists as a complicated exponential term in [TeX:] $$\Upsilon$$ (see (7) and (9)), it is difficult to analytically characterize the objective function, e.g., show concavity or find a tractable expression for the optimal access probability. In order to tackle this, we propose to find the optimal access probability q* that maximizes [TeX:] $$\Upsilon$$ via the bisection search method in its feasible range[TeX:] $$q \in[0,1].$$ Then, the obtained q* is used to solve for

the caching probability b in the optimization problem below.

##### (15)

[TeX:] $$\text { P2: } \begin{aligned} \max _{b} & \mathbb{P}_{o}\left(q^{*}, b\right) \\ & \text { s.t. } \quad(12),(13) \end{aligned}$$ **Lemma 3:** For fixed [TeX:] $$q^{*}, \mathbb{P}_{\rtimes}\left(\|^{*}, b\right)$$ is a concave function w.r.t. b and the optimal caching probability b* that maximizes the offloading gain is given by

where [TeX:] $$\psi\left(v^{*}\right)$$ is the solution of [TeX:] $$v^{*}=p_{i}+p_{i}\left(\bar{n}\left(1-b_{i}^{*}\right) e^{-\bar{n} b_{i}^{*}}-\right. \left.\left(1-e^{-\bar{n} b_{i}^{*}}\right)\right) \Upsilon,$$ that satisfies [TeX:] $$\sum_{i=1}^{N_{f}} b_{i}^{*}=M.$$

Proof: Please see Appendix B.

Clearly, the optimal caching solution b* depends on the scheduling of devices through channel access probability q* from [TeX:] $$\Upsilon,$$ while q* is independent of b*. [9] shows that a PPP network exhibits the same property, i.e., the caching scheme is scheduling-dependent. To gain some insights, it is useful to consider a simple case when only one D2D link per cluster is allowed. In this case, the rate coverage probability of the proposed clustered model with one active D2D link within a cluster will be [20, Lemma 2]:

##### (16)

[TeX:] $$\Upsilon=\frac{1}{\left(4 \sigma^{2} \pi \lambda_{p} \vartheta^{2 / \alpha} \Gamma(1+2 / \alpha) \Gamma(1-2 / \alpha)+1\right)}.$$Substituting in (10) for [TeX:] $$\Upsilon,$$ we get the offloading gain as

##### (17)

[TeX:] $$\mathbb{P}_{o}(\boldsymbol{b})=\sum_{i=1}^{N_{f}} p_{i} b_{i}+\frac{p_{i}\left(1-b_{i}\right)\left(1-e^{-b_{i} \pi}\right)}{4 \sigma^{2} \pi \lambda_{p} \vartheta^{2 / \alpha} \Gamma(1+2 / \alpha) \Gamma(1-2 / \alpha)+1}.$$ **Remark 1:** From (17), it is clear that the offloading gain increases as [TeX:] $$\sigma \text { and } \lambda_{p}$$ decrease. Particularly, the offloading gain is inversely proportional to the density of clusters [TeX:] $$\lambda_{p}$$ and the variance of the displacement [TeX:] $$\sigma^{2}.$$ This is because smaller [TeX:] $$\sigma$$ results in higher levels of the desired signal, while lower [TeX:] $$\lambda_{p}$$ leads to smaller encountered interference at the typical device.

## V. NUMERICAL RESULTS

We first validate the developed mathematical model via Monte Carlo simulations. Then we benchmark the proposed caching scheme against conventional caching schemes. Unless otherwise stated, the network parameters are selected as shown in Table 1.

In Fig. 2, we plot the rate coverage probability [TeX:] $$\Upsilon$$ against the channel access probability q. The theoretical and simulated results are plotted together, and they are consistent. Clearly, there is an optimal q*; before it, [TeX:] $$\Upsilon$$ tends to increase as the probability of accessing the channel increases, and beyond it, [TeX:] $$\Upsilon$$ tends to decrease due to the effect of aggressive interference. It is intuitive to observe that the optimal access probability q*, which maximizes [TeX:] $$\Upsilon$$, decreases as [TeX:] $$\vartheta$$ increases. This reflects the fact the system becomes more sensitive to the effect of interference when a higher SIR threshold is required.

Fig. 3 manifests the effect of the access probability q on the offloading gain. The offloading gain is plotted against q for different caching schemes, namely, the proposed probabilistic caching (PC), Zipf caching (Zipf), and uniform random caching (RC). Fig. 3 is plotted for an SIR threshold [TeX:] $$\vartheta=0 \mathrm{~dB},$$ hence, the optimal access probability q* is near one from Fig. 2. Clearly, the offloading gain for the different caching schemes improves as q approaches its optimal value, which reveals the crucial impact of the device scheduling on the content placement and ac

cordingly, on the offloading gain. Moreover, the proposed PC is shown to attain the best performance as compared to other benchmark schemes.

To show the effect of q on the caching probability, in Fig. 4, we plot the histogram of the optimal caching probability at different q values. Specifically, [TeX:] $$p=q^{*}$$ in Fig. 4(a) and [TeX:] $$q<q^{*}$$ in Fig. 4(b). It is clear from the histograms that the optimal caching probability b* tends to be more skewed when [TeX:] $$q<q^{*} \text {, }$$ i.e., when [TeX:] $$\Upsilon$$ decreases. This shows that file sharing is more difficult when q is not optimized. Broadly speaking, for [TeX:] $$q<q^{*},$$ the system is too conservative, while for [TeX:] $$q>q^{*} \text {, }$$ the outage probability is high due to the aggressive interference. In such regimes, each device tends to cache the most popular files leading to fewer opportunities of content transfer.

Fig. 5 illustrates the prominent effect of the content popularity on the offloading gain, and compares the achievable gain of three different caching schemes. Clearly, the offloading gain of the proposed PC attains the best performance as compared to other schemes. Particularly, 10% improvement in the offloading gain is observed compared to the Zipf caching when [TeX:] $$\beta=1.$$ Moreover, we note that all caching schemes encompass the same offloading gain when [TeX:] $$\beta=0$$ owing to the uniformity of content popularity.

To show the effect of network geometry, in Fig. 6, we plot the closed-form offloading gain in (17) against [TeX:] $$\sigma$$ at different [TeX:] $$\lambda_{p}.$$ Fig. 6 shows that the offloading gain monotonically decreases with both [TeX:] $$\sigma \text { and } \lambda_{p}.$$ This is because content sharing between

devices turns out to be less successful when the distance between devices is large, i.e., larger [TeX:] $$\sigma.$$ Analogously, file sharing among the cluster devices is accompanied with higher interference when [TeX:] $$\lambda_{p} \text { and } \sigma$$ are higher. Accordingly, this expected degradation prohibits successful content delivery via D2D communication.

Last, in Fig. 7, we plot the offloading gain versus the popularity index [TeX:] $$\beta$$ at different densities of cluster devices [TeX:] $$\bar{n}.$$ Fig. 7 first shows that the proposed optimized probabilistic caching scheme achieves the best performance as compared to caching popular files (CPF) and Zipf caching. In addition, Fig. 7 shows that the attained offloading gain increases as the number of devices per cluster increases. This is attributed to the fact that the probability of having requested contents cached at a neighbor device within the same cluster increases when the number of cluster members is higher.

## VI. CONCLUSION

In this paper, we have proposed a joint communication and caching optimization framework for clustered D2D networks. In particular, we have conducted joint optimization of channel access probability and content placement in order to maximize the offloading gain. We have characterized the optimal content caching scheme as a function of the system parameters, namely, density of clusters, average number of devices per cluster, content caching, placement and access probabilities. A bisection search method is also proposed to calculate the optimal channel access probability. We have demonstrated that deviating from the optimal access probability makes file sharing more difficult, i.e., the system is too conservative for small access probabilities, while the interference is too aggressive for larger access probabilities. Results showed up to 10% enhancement in offloading gain compared to the Zipf caching technique.

## APPENDIX A PROOF OF LEMMA 1

Laplace transform of the inter-cluster aggregate interference [TeX:] $$I_{\Phi_{P}^{1}}$$ can be evaluated as

where (a) follows from the Rayleigh fading assumption, (b) follows from the probability generating functional (PGFL) of Gaussian PPP [TeX:] $$\Phi_{c q},$$ and (c) follows from the PGFL of the parent PPP [TeX:] $$\Phi_{p^{*}}.$$ By using change of variables [TeX:] $$z=x+y \text { with } \mathrm{dz}=\mathrm{dy},$$ we proceed as

##### (18)

[TeX:] $$\left.\mathscr{L}_{I_{\Phi_{p}}^{!}}(s)=e^{-\lambda_{p} \int_{\mathbb{R}^{2}}\left(1-e^{-q \bar{n}} \int_{\mathbb{R}^{2}}\left(1-\frac{1}{1+s\|z\|^{-\alpha}}\right)_{f_{Y}(z-x) \mathrm{dy}}\right.}\right) \mathrm{dx}$$

##### (19)

[TeX:] $$\stackrel{(d)}{=} e^{-2 \pi \lambda_{p} \int_{v=0}^{\infty}}\left(1-e^{-q \bar{n} \int_{u=0}^{\infty}\left(1-\frac{1}{1+s u^{-\alpha}}\right) f_{U}(u \mid v) \mathrm{d} \mathrm{u}}\right) v \mathrm{dv} \\ =e^{-2 \pi \lambda_{p} \int_{v=0}^{\infty}\left(1-e^{-q \bar{n} \int_{u=0}^{\infty} 0 \frac{s}{s+u^{\alpha}} f_{U}(u \mid v) \mathrm{du}}\right) v \mathrm{dv}},$$where (d) follows from converting the cartesian coordinates to the polar coordinates with [TeX:] $$u=\|z\|.$$ To clarify how in (d) the normal distribution [TeX:] $$f_{Y}(z-x)$$ is converted to the Rice distribution [TeX:] $$f_{U}(u \mid v),$$ consider a remote cluster centered at [TeX:] $$x \in \Phi_{p}^{!},$$ with a distance [TeX:] $$v=\|x\|$$ from the origin. Every interfering device belonging to the cluster centered at x has its coordinates in [TeX:] $$\mathbb{R}^{2}$$ chosen independently from a Gaussian distribution with standard deviation . Then, by definition, the distance from such an interfering device to the origin, denoted as u, has a Rice distribution, denoted as [TeX:] $$f_{U}(u \mid v)=u / \sigma^{2} \exp \left(-\left(\mathrm{u}^{2}+\mathrm{v}^{2}\right) / 2 \sigma^{2}\right) \mathrm{I}_{0}\left(\mathrm{uv} / \sigma^{2}\right),,$$ where [TeX:] $$I_{0}$$ is the modified Bessel function of the first kind with oRrder zero and[TeX:] $$\sigma$$ is the scale parameter. Letting [TeX:] $$\varphi(s, v)= \ \int_{u=0}^{\infty} s /\left(s+u^{\alpha}\right) f_{U}(u \mid v)$$ du, the proof is completed.

## APPENDIX B PROOF OF LEMMA 3

First, to prove concavity, we proceed as follows.

##### (20)

[TeX:] $$\begin{gathered} \frac{\partial \mathbb{P}_{\rtimes}}{\partial b_{i}}=q_{i}+q_{i}\left(\bar{n}\left(1-b_{i}\right) e^{-\bar{n} b_{i}}-\left(1-e^{-\bar{n} b_{i}}\right)\right) \Upsilon \\ \frac{\partial^{2} \mathbb{P}_{\rtimes}}{\partial b_{i} \partial b_{j}}=-q_{i}\left(\bar{n} e^{-\bar{n} b_{i}}+\bar{n}^{2}\left(1-b_{i}\right) e^{-\bar{n} b_{i}}+\bar{n} e^{-\bar{n} b_{i}}\right) \Upsilon \end{gathered}$$It is clear that the second derivative [TeX:] $$\frac{\partial^{2} \mathbb{P}_{o}}{\partial b_{i} \partial b_{j}}$$ is negative. Hence, the Hessian matrix [TeX:] $$\mathbf{H}_{i, j} \text { of } \mathbb{P}_{o}\left(p^{*}, b_{i}\right)$$ w.r.t. [TeX:] $$b_{i}$$ is negative semidefinite, and the function [TeX:] $$\mathbb{P}_{o}\left(p^{*}, b_{i}\right)$$ is concave with respect to bi. Also, the constraints are linear, which implies that the necessity and sufficiency conditions for optimality exist. The dual Lagrangian function and the KKT conditions are then employed to solve P2 [23]. The KKT Lagrangian function of the energy minimization problem is given by

##### (21)

[TeX:] $$\begin{aligned} \mathcal{L}\left(b_{i}, w_{i}, \mu_{i}, v\right) \\ =\sum_{i=1}^{N_{f}} q_{i} b_{i}+q_{i}\left(1-b_{i}\right)\left(1-e^{-b_{i} \bar{n}}\right) \Upsilon \end{aligned}$$

##### (22)

[TeX:] $$+v\left(M-\sum_{i=1}^{N_{f}} b_{i}\right)+\sum_{i=1}^{N_{f}} w_{i}\left(b_{i}-1\right)-\sum_{i=1}^{N_{f}} \mu_{i} b_{i},$$where [TeX:] $$v, w_{i}, \text { and } \mu_{i}$$ are the dual equality and two inequality constraints, respectively. Now, the optimality conditions are written

##### (23)

[TeX:] $$\begin{aligned} \text { as } \boldsymbol{\nabla}_{b_{i}} \mathcal{L}\left(b_{i}^{*}, w_{i}^{*}, \mu_{i}^{*}, v^{*}\right)= \\ q_{i}+q_{i}\left(\bar{n}\left(1-b_{i}\right) e^{-\bar{n} b_{i}}-\left(1-e^{-\bar{n} b_{i}}\right)\right) \Upsilon-v^{*}+w_{i}^{*}-\mu_{i}^{*}=0 \end{aligned} \\ w_{i}^{*} \geq 0$$

The optimality conditions imply that:

##### (28)

[TeX:] $$1. $w_{i}^{*}>0$ : We have $b_{i}^{*}=1, \mu_{i}^{*}=0$, and \begin{aligned} q_{i}-q_{i}\left(1-e^{-\bar{n}}\right) \Upsilon=v^{*}-w_{i}^{*} \\ v^{*}q_{i}-q_{i}\left(1-e^{-\bar{n}}\right) \Upsilon \end{aligned}$$

##### (29)

[TeX:] $$2. $\mu_{i}^{*}0$ : We have $b_{i}^{*}=0$, and $w_{i}^{*}=0$, and \begin{aligned} q_{i}+\bar{n} q_{i} \Upsilon=v^{*}+\mu_{i}^{*} \\ v^{*}>q_{i}+\bar{n} q_{i} \Upsilon \end{aligned}$$

##### (30)

[TeX:] $$3. $0b_{i}^{*}1:$ We have $w_{i}^{*}=\mu_{i}^{*}=0$, and v^{*}=q_{i}+q_{i}\left(\bar{n}\left(1-b_{i}^{*}\right) e^{-\bar{n} b_{i}}-\left(1-e^{-\bar{n} b_{i}}\right)\right) \Upsilon$$BPy combining (28), (29), and (30), with the fact that [TeX:] $$\sum_{i=1}^{N_{f}} b_{i}^{*}=M,$$ Lemma 3 is proven.

## Biography

##### Ramy Amer

Ramy Amer received his Ph.D. in Electrical and Communications Engineering from Trinity College Dublin in November 2020. He was a Visiting Scholar at Virginia Tech, USA from September 2018 to March 2019. He holds one best paper award, IEEE student travel grant, and IEEE exemplary reviewer award. His research interests include edge caching and Intelligence, edge computing and IoT, machine learning, and UA Vs. Ramy has been first author in one book chapter and six journals and eight IEEE conference papers.

## Biography

##### Mohamed Baza

Mohamed Baza is currently an Assistant Professor at the Department of Computer Science at College of Charleston, SC, USA. He received his Ph.D. degree in Electrical and Computer Engineering from Tennessee Tech University, Cookeville, Tennessee, United States in Dec. 2020 and B.S. and M.S. degrees in Electrical Computer Engineering from Benha University, Egypt in 2012 and 2017 respectively. From August 2020 to May 2021, he worked as a Visiting Assistant Professor at the Department of Computer Science at Sam Houston State University, TX, USA. He also has more than two years of industry experience in information security in Apachekhalda petroleum company, Egypt. Dr. Baza is the Author of numerous papers published in major IEEE conferences and journals, such as IEEE Wireless Communications and Networking Conference (IEEE WCNC), IEEE International Conference on Communications (IEEE ICC), IEEE Vehicular Technology Conference (IEEE VTC), IEEE Transactions on Dependable and Secure Computing, IEEE Transactions of Vehicular Technology (TVT), IEEE Transactions on Network Science and Engineering (TNSE), and IEEE Systems Journal. He served as a Reviewer for several journals and conferences such as IEEE Transactions on Vehicular Technology, IEEE IoT Journal, and the journal of Peer-to-Peer Networking. His research interests include blockchains, cyber-security, machine learning, smart-grid, and vehicular ad-hoc networks. He also a recipient of best IEEE paper award in the International Conference on Smart Applications, Communications and Networking (SmartNets 2020).

## Biography

##### Tara Salman

Tara Salman is finishing a graduate Research Assistant at Washington University in St. Louis. She received her B.S. in Computer Engineering and M.S. degrees in Computer Networking from Qatar University Doha, Qatar in 2012 and 2015, respectively. She is currently pursuing a Ph.D. in Computer Science Engineering at Washington University in St Louis, Missouri, USA. From 2012 -2015, she worked as a Research Assistant with Qatar University on an NPRP (National Priorities Research Program) funded project targeting physical layer security. Since 2015, she has been working as a Graduate Research Assistant at Washington University in St. Louis. Her research interests span network security, distributed systems, the Internet of things, and financial technology. She is an Author of 1 book chapter and numerous papers published at major IEEE conferences and journals. Tara Salman is an EECS Rising Star in UC Berkeley 2020, is a Recipient of the Cisco Certified Network Associate (CCNA) certification in 2012, and has completed all four levels of CCNA at Cisco academy-Qatar university branch.

## Biography

##### M. Majid Butt

M. Majid Butt received the M.Sc. degree in Digital Communications from Christian Albrechts University, Kiel, Germany, in 2005, and the Ph.D. degree in Telecommunications from the Norwegian University of Science and Technology, Trondheim, Norway, in 2011. He is currently a Senior Research Specialist 5G+ at Nokia Bell Labs, Paris-Saclay, France, and also an adjunct Research Professor at Trinity College Dublin, Dublin, Ireland. Prior to that, he has held various positions at the University of Glasgow, U.K., Trinity College Dublin, Ireland, Fraunhofer HHI, Germany, and the University of Luxembourg. His current research interests include communication techniques for wireless networks with a focus on radio resource allocation, scheduling algorithms, energy efficiency, and machine learning for RAN. He has Authored more than 70 peer-reviewed conference and journal publications, and filed some 10 patents in these areas. He frequently gives invited talks, as well as technical tutorial talks on various topics in IEEE conferences, including ICC, Globecom, PIMRC, VTC, ISWCS, etc. He was a Recipient of the Marie Curie Alain Bensoussan Post-Doctoral Fellowship from the European Research Consortium for Informatics and Mathematics. He has served as the Organizer/chair for technical workshops on various aspects of communication systems in conjunction with major IEEE conferences. He has been an Associate Editor of the IEEE ACCESS and the IEEE Communication Magazine since 2016.

## Biography

##### Ahmad Alhindi

Ahmad Alhindi received the B.Sc. degree in Computer Science from Umm Al-Qura University (UQU), Makkah, Saudi Arabia, in 2006, and the M.Sc. degree in Computer Science and the Ph.D. degree in Computing and Electronic Systems from the University of Essex, Colchester, U.K., in 2010 and 2015, respectively. He is currently an Assistant Professor in Artificial Intelligence (AI) with Computer Science Department, UQU. His current research interests include evolutionary multi-objective optimization and machine learning techniques. He is currently involved in AI algorithms, focusing particularly on machine learning and optimization with a willingness to implement them in a context of decision making and solving combinatorial problems in real-world projects.

## References

- 1 E. Bastug, M. Bennis, M. Debbah, "Living on the edge: The role of proactive caching in 5G wireless networks,"
*IEEE Commun.Mag.*, vol. 52, no. 8, pp. 82-89, Aug, 2014.doi:[[[10.1109/MCOM.2014.6871674]]] - 2 M. Ji, G. Caire, A. F. Molisch, "Fundamental limits of caching in wireless D2D networks,"
*IEEE Trans.Inf.Theory*, vol. 62, no. 2, pp. 849869-849869, Feb, 2016.doi:[[[10.1109/TIT.2015.2504556]]] - 3 K. Shanmugam et al., "Femtocaching: Wireless content delivery through distributed caching helpers,"
*IEEE Trans. Inf. Theory*, vol. 59, no. 12, pp. 8402-8413, Dec, 2013.doi:[[[10.1109/TIT.2013.2281606]]] - 4 N. Golrezaei et al., "Base-station assisted device-to-device communications for high-throughput wireless video networks,"
*IEEE Trans. Wireless Commun.*, vol. 13, no. 7, pp. 3665-3676, July, 2014.doi:[[[10.1109/TWC.2014.2316817]]] - 5 R. Amer, M. M. Butt, M. Bennis, N. Marchetti, "Inter-cluster cooperation for wireless D2D caching networks,"
*IEEE Trans.WirelessCommun.*, vol. 17, no. 9, pp. 6108-6121, Sept, 2018.doi:[[[10.1109/TWC.2018.2854603]]] - 6 Z. Chen, N. Pappas, M. Kountouris, "Probabilistic caching in wireless D2D networks: Cache hit optimal versus throughput optimal,"
*IEEE Commun.Lett.*, vol. 21, no. 3, pp. 584-587, Mar, 2017.doi:[[[10.1109/LCOMM.2016.2628032]]] - 7 B. Blaszczyszyn, A. Giovanidis, "Optimal geographic caching in cellular networks," in
*Proc.IEEE ICC*, June, 2015;custom:[[[-]]] - 8 S. Andreev et al., "Analyzing assisted offloading of cellular user sessions onto D2D links in unlicensed bands,"
*IEEE J.Sel.AreasCommun.*, vol. 33, no. 1, pp. 67-80, 2015.doi:[[[10.1109/JSAC.2014.2369616]]] - 9 B. Chen, C. Yang, Z. Xiong, "Optimal caching and scheduling for cache-enabled D2D communications,"
*IEEE Commun.Lett.*, vol. 21, no. 5, pp. 1155-1158, May, 2017.doi:[[[10.1109/LCOMM.2017.2652440]]] - 10 S. H. Chae, T. Q. S. Quek, W. Choi, "Content placement for wireless cooperative caching helpers: A tradeoff between cooperative gain and content diversity gain,"
*IEEE Trans.WirelessCommun.*, vol. 16, no. 10, pp. 6795-6807, Oct, 2017.doi:[[[10.1109/TWC.2017.2731760]]] - 11 M. Lee, A. F. Molisch, N. Sastry, A. Raman, "Individual preference probability modeling and parameterization for video content in wireless caching networks,"
*IEEE /ACM Trans. Netw.*, vol. 27, no. 2, pp. 676-690, Apr, 2019.custom:[[[-]]] - 12 M. Afshang, H. S. Dhillon, P. H. J. Chong, "Modeling and performance analysis of clustered device-to-device networks,"
*IEEE Trans. WirelessCommun.*, vol. 15, no. 7, pp. 4957-4972, July, 2016.doi:[[[10.1109/TWC.2016.2550024]]] - 13 ——, "Fundamentals of cluster-centric content placement in cacheenabled device-to-device networks,"
*IEEE Trans.Commun.*, vol. 64, no. 6, pp. 2511-2526, June, 2016.custom:[[[-]]] - 14 N. Deng, M. Haenggi, "The benefits of hybrid caching in gauss-poisson D2D networks,"
*IEEE J. Sel. Areas Commun.*, vol. 36, no. 6, pp. 12171230-12171230, June, 2018.doi:[[[10.1109/JSAC.2018.2844953]]] - 15 R. Amer et al., "Optimized caching and spectrum partitioning for D2D enabled cellular systems with clustered devices,"
*IEEE Trans. Commun.*, vol. 68, no. 7, pp. 4358-4374, July, 2020.custom:[[[-]]] - 16 R. Amer, H. Elsawy, J. Kibiłda, M. M. Butt, N. Marchetti, "Performance analysis and optimization of cache-assisted CoMP for clustered D2D networks,"
*IEEE Trans.MobileComput.early Access*, 2020.custom:[[[-]]] - 17 R. Amer, W. Saad, H. ElSawy, M. Butt, N. Marchetti, "Caching to the sky: Performance analysis of cache-assisted CoMP for cellular-connected UA Vs,"
*in Proc.IEEE WCNC*, Apr, 2019.custom:[[[-]]] - 18 M. Haenggi, Stochasticgeometryforwirelessnetworks. Cambridge University Press,
*2012*, 2012.custom:[[[-]]] - 19 R. Amer, H. ElSawy, J. Kibiłda, M. M. Butt, N. Marchetti, "Cooperative transmission and probabilistic caching for clustered D2D networks," in
*Proc.IEEE WCNC*, Apr, 2019;custom:[[[-]]] - 20 R. Amer et al., "On minimizing energy consumption for D2D clustered caching networks," in
*Proc.IEEE GLOBECOM*, Dec, 2018;custom:[[[-]]] - 21 A. Rachedi, M. H. Rehmani, S. Cherkaoui, J. J. P. C. Rodrigues, "The plethora of research in Internet of things (IoT),"
*IEEE Access*, vol. 4, pp. 9575-9579, 2016.custom:[[[-]]] - 22 L. Breslau, Pei Cao, Li Fan, G. Phillips, S. Shenker, "Web caching and zipf-like distributions: evidence and implications," in
*Proc. IEEE INFOCOM*, Mar, 1999;custom:[[[-]]] - 23 S. Boyd, L. Vandenberghe,
*Convex optimization*, Cambridge university press, 2004.custom:[[[-]]]