** Multi-Server Federated Edge Learning for Low Power Consumption Wireless Resource Allocation Based on User QoE **

Tianyi Zhou , Xuehua Li , Chunyu Pan , Mingyu Zhou and Yuanyuan Yao

## Article Information

## Abstract

**Abstract:** Abstract—Federated edge learning (FEL) deploys a machine learning algorithm by using devices distributed on the edge of a network, trains massive local data, uploads the local model to update the parameters after training, and performs alternate updating with global model parameters to reduce the pressure for uplink data transmission, prevent systematic time delay and ensuredatasecurity.Thispaperproposesthatanoptimalbalance between time delay and energy consumption be achieved by optimizing the transmission power and bandwidth allocation based on user quality of experience (QoE) in a multi-server intelligent edge network. Given the limited computing capability of devices involvedinFELlocaltraining,thetransmissionpowerismodeled as a quasi-convex uplink power allocation (UPA) problem, and a lower energy consumption bandwidth allocation algorithm is proposed for solution-seeking. The proposed algorithm allocates appropriate power to the device by adapting the computing power and channel state of the device, thereby reducing energy consumption. As the theoretical deduction result suggests that additional bandwidth should be allocated to those devices with weak computing capabilities and poor channel conditions to realize minimal energy consumption within the restraint time. The simulation result indicates that, the maximum gain of the proposed algorithm can be optimized by 31% compared with the baseline.

**Keywords:** index terms—bandwidth optimization , federated edge learning , QoE , uplink power allocation

## I. INTRODUCTION

THE traditional machine learning (ML) algorithm usually adopts the centralized model training method [1]–[3]. However, transmitting massive data to the central server not only causes privacy leakage but also results in uplink congestion and serious transmission delay. The distributed edge learning based on distributed ML and mobile edge computing (MEC) can significantly reduce the traffic load and end-to-end time delay in communication networks by using the computing capability and datasets of massively distributed edge devices and by co-training the shared ML model [4]–[7]. Among the available learning approaches, federated edge learning (FEL) [8] shows great potential in solving the above mentioned problem.

Compared with traditional distributed machine learning, FEL has more prospects for data privacy. Firstly, the compute node has absolute control over the data and the central server cannot directly or indirectly manipulate the data on the compute node in FEL. Secondly, in the process of data transmission, compared with traditional distributed machine learning, FEL only needs to upload local model parameters without sharing local data. While protecting data privacy, it can release the pressure of uplink and greatly reduce the amount of data transmitted.

FEL supports ML in data and model training in the mobile communication system and allows multiple-edge smart devices to complete model training and parameter sharing through local data iteration [9]. The iteration process has two parts, namely, local model training and updating and global aggregation of updated parameters. Distributed local training is performed for generating the local model. The updated local model parameters are uploaded, and the global model is optimized in the central server by analyzing the local models of smart devices, then broadcasted the updated model. Unlike traditional ML, the FEL algorithm requires users to transmit local model parameters to the base station (BS) through a wireless link and imposes additional requirements for the energy consumed in training and wireless link resource allocation. Federated learning (FL) and wireless transmission have received much research attention in recent years. For instance, Zhang et al. proposed a method that ensures minimal transmission time delay to improve FL algorithm efficiency [10]. In [11], the authors used the FL algorithm for traffic estimation to maximize user data rate. To reduce time delay, [12] proposed a partially average solution and only used the updated parameters from quick-response devices for global model updating. In addition, for the purpose of reducing communication load in FL, [13] compress gradients uploaded by edge nodes to reduce the time required for communication.

However, affected by bandwidth, edge device energy consumption, and inter-cell interference, FEL faces serious challenges in wireless link data transmission. In [10]–[13], the authors ignored the effect of edge device energy consumption and inter-cell interference on the FEL training process. Moreover, no study has attempted to jointly optimize FEL wireless resources and energy consumption based on user

quality of experience (QoE). Meanwhile, [14] selected suitable users to execute the FEL algorithm by disclosing edge device CPU cycle frequencies and transmission power to minimize energy consumption. Ref. [15] devises the Model Update Compression by Soft Clustering (MUCSC) algorithm to compress model updates transmitted between clients and the parameter server to reduce the volume of communication traffic in FL. Ref. [16] described how the computing capability and communication delay of mobile devices affect UE energy consumption, system learning time, and learning precision parameters yet only considered the application scenario of mobile cellular networks that consists of single servers. Ref. [17] obtains the user selection and uplink resource block allocation scheme by solving the optimization problem, and then deduces the optimal transmission power of each user according to the expected convergence speed of the algorithm. However, previous studies on FEL wireless transmission have also ignored multi-unit and multi-server scenarios or the effect of interference.

This paper designs a multi-unit and multi-server mobile edge cellular network for FEL wireless transmission, which maximizes network performance by implementing a strategy that jointly optimizes energy consumption and bandwidth optimization. Specifically, we consider a multi-unit MEC cellular network, with each BS equipped with an edge server for the global aggregation of the model. The distributed deployment of edge servers and intensification of BSs caters to the prospect of “smart interconnection” of all things in the future 6G system.

In this work, multi-unit and multi-server MEC networks are considered for the first time in FEL wireless transmission. Based on multi-user transmission and by considering inter-cell interference, the proposed problem is both complex and nonconvex [18]. Therefore, it’s hard to find a solution directly. In this case, we propose a low power consumption resource allocation strategy to reduce the complexity. The innovations of this paper are as follows:

1) Compared with single MEC server systems, each user in the designed strategy can select the closest server from the multi-server network for parameter uploading to reduce the energy consumption for FEL parameter transmission. Moreover, the coordination and allocation of resources with multiple servers can alleviate inter-cell interference, and these servers fight over wireless resources between a user and other neighboring users, thereby improving network benefits when multiple users request for FEL parameter uploading simultaneously.

2) We use task completion time and device energy consumption to quantify the QoE and to model the transmission efficiency of each user as the weighted sum of task completion time and device energy consumption optimization. Based on user QoE, the optimization problem is modeled as an uplink power allocation (UPA) to optimize the uplink transmission power of users. This paper then considers the problem of minimizing edge device energy consumption, proposes a low power consumption bandwidth allocation (BA) strategy, and theoretically deduces the convergence form of the optimal strategy for minimizing energy consumption.

3) Given the computing capability and energy consumption of edge devices involved in FEL local model training, this paper proposes a low complexity UPA algorithm for solution, which reduces the iteration times and computing complexity of the FEL.

The rest of this paper is organized as follows. Section II describes the system model. Section III proposes the uplink power allocation optimization problem for FEL wireless transmission. Section IV discusses the power consumption BA strategy. Section V provides the simulation results. Section VI summarizes the paper.

## II. SYSTEM MODEL

We consider a multi-unit and multi-server FEL system with an edge server for each BS to achieve a convergence updating of the global model. The system model is shown in Figure 1. The FEL iteration process can be divided into several steps. First, after the user utilizes local data for model training, the local model parameters transmitted through wireless links are used to update the global model. Second, the server distributes the converged and updated global model to replace the original model as shown in Figure 2. Each iteration is called a round of communication. The sets of edge devices (user) and edge servers in the FEL system are expressed as [TeX:] $$U=\{1,2, \cdots, u\}$$ and [TeX:] $$S=\{1,2, \cdots, s\}.$$ The edge servers are assumed to obtain the model size, multi-user channel gain, local computing capability, and others through feedback. These servers use such information to determine the uplink power allocation and low power consumption bandwidth allocation strategy for each round of communication. The modeling of user computing tasks and parameter uploading is described below. The key symbols used in this paper are listed in Table I.

##### A. Computing Task Model

We use [TeX:] $$T_{u}$$ to represent the task of user [TeX:] $$u \in U$$ that utilizes local data for model training, [TeX:] $$\left\langle c_{u}, d_{u}\right\rangle$$ to represent the computing and data amounts of the task of user [TeX:] $$u, c_{u}[\text { cycles }]$$ represents the amount of computing resources required to complete the model training, and [TeX:] $$d_{u}[\text { bits }]$$ represents the number of data consumed for uploading the parameters from edge devices to edge servers. The values of [TeX:] $$c_{u} \text { and } d_{u}$$ can be obtained by analyzing the task execution status [19]. [TeX:] $$f_{u}^{l}>0$$

represents the local computing capability of user u with the unit of CPU cycle/s. The local model training time for user u is [TeX:] $$t_{u}^{l}=c_{u} /\left(f_{u}^{l}\right).$$

##### B. Training Model

We design a general FEL model[17]. Each user u collects an input matrix [TeX:] $$I_{u}=\left[i_{u 1}, \cdots, i_{u n}\right],$$ where n represents the number of samples collected by each user [TeX:] $$u, i_{u n}$$ is the input vector of FL algorithm. The size of [TeX:] $$i_{\text {un }}$$ depends on the specific FEL task. For example, the network can execute an FEL algorithm to sense the wireless environment and generate a holistic radio environment mapping, each user will collect the data related to the wireless environment for training an FL model[21]. Output of local edge device index by [TeX:] $$o_{u}=\left[o_{u 1}, \cdots, o_{u n}\right].$$ We capture the input parameters of local model training [TeX:] $$I_{u}$$ and output parameter [TeX:] $$o_{u} \text { by } w_{u},$$ which determines the local model of each user u. For example, in the task of classification using BP algorithm, [TeX:] $$i_{\text {un }}^{T} w_{u}$$ indicates classified output, [TeX:] $$w_{u}$$ is the weight vector that determines the BP algorithm.After the local model training, user u uploads [TeX:] $$w_{u}$$ to the edge server to participate in the global model aggregation. Using the BP algorithm to perform local training for the neural network model on a GPU is considered [21]. The experiment in [22] reveals that the energy consumption of GPU only depends on the complexity and model parameter size (or equivalent dimensions) of the BP algorithm. Given that all edge devices use the BP algorithm to train the same models with data amount [TeX:] $$d_{u},$$ all edge devices used for local training are assumed to be the same and expressed as [TeX:] $$E_{u}^{l}.$$ We assume that the model training of the equipment is accurate and the loss is within the acceptable range.

##### C. Parameter Uploading Task Model

We consider OFDMA as the mobile edge computing cellular network for the multi-address access solution in the uplink [23], and we divide the working frequency band B into N equal sub-frequency bands with a size of [TeX:] $$W= B / N[\mathrm{~Hz}] . \mathcal{N}=\{1, \cdots, N\}$$ represents the available subfrequency bands of each BS. To ensure the orthogonality of uplink transmission between users related to the same BS, each user is allocated with one sub-frequency band. The N sub-frequency bands divided from each BS can serve up to N users simultaneously. We define the regulation parameter [TeX:] $$x_{u s}^{j}, u \in U, s \in S, j \in \mathcal{N}$$ of the uplink sub-frequency band. When [TeX:] $$x_{u s}^{j}=1,$$ the computing task [TeX:] $$T_{u}$$ of user u is uploaded to the BS through channel j. Otherwise, [TeX:] $$x_{u s}^{j}=0.$$ Moreover, we define [TeX:] $$U_{s}=\left\{u \in U \mid \sum_{j \in \mathcal{N}} x_{u s}^{j}=1\right\}$$ as the set of users who are uploading parameters to server s.

The delay generated in each round of communication includes 1) the time for edge devices to use local data for model training, [TeX:] $$t_{u}^{l}[\mathrm{~s}],$$ and 2) the time for edge devices to upload parameters to edge servers through an uplink, [TeX:] $$t_{u}[\mathrm{~s}].$$ The time restraint for local model training and model parameter uploading is

where T is the maximal time constraint. The computing capability heterogeneity of edge devices is measured by the difference in [TeX:] $$\left\{t \&_{u}^{l}\right\}$$ values. The computing capability heterogeneity of edge devices is measured by the difference in [TeX:] $$t_{u}^{l}$$ values. [TeX:] $$\mathcal{T}=\left\{t_{u} \mid 0 < t_{u} \leq T_{u}, u \in U\right\}$$ represents the time set of users u uploading parameter [TeX:] $$d_{u}$$ to edge servers. The value of [TeX:] $$t_{u}$$ is limited by the maximal transmission time [TeX:] $$T_{u}=T-t_{u}^{l}.$$

We assume that each user and BS have a single antenna for uplink transmissions. [TeX:] $$h_{u s}^{j}$$ is defined as the uplink gain transmission between user u and edge server s through sub-channel j and represents the effect of path loss, shadow, and antenna gain. The association duration between the user and server is usually much longer than the smallscale decline duration. We assume that the effect of quick decline during the association period is uniform [24]. [TeX:] $$\mathcal{P}= \left\{p_{u} \mid 0< p_{u} \leq P_{u}, u \in U\right\}$$ represents user transmission power, and [TeX:] $$p_{u}[\mathrm{~W}]$$ represents the transmission power consumed by user u in the uploading parameter [TeX:] $$d_{u}$$ to the server. This value is limited by maximal transmission power [TeX:] $$P_{u} \text {. }$$ If [TeX:] $$u \notin U,$$ then [TeX:] $$p_{u}=0.$$ Although users send tasks to the same edge server through different sub-frequency bands, the transmission is still affected by inter-cell interference. Under this condition, the signal to interference plus noise ratio (SINR) of user u uploading parameters to the server s through sub-frequency band j can be represented as

##### (2)

[TeX:] $$\gamma_{u s}^{j}=\frac{p_{u} h_{u s}^{j}}{\sum_{r \in S} \sum_{k \in U_{r}} x_{k r}^{j} p_{k} h_{k s}^{j}+N_{0}}, \forall u \in U, s \in S, j \in \mathcal{N},$$where [TeX:] $$N_{0}$$ is the noise variance, and the first term of the denominator represents the total interference of all users related to other edge servers s on the same sub-frequency band j. Given that the transmission of one user is always on one sub-frequency band, the maximal transmission rate [bits/s] of the user u to server s is given by

where [TeX:] $$\chi$$ is the uploading decision. Given uploading decision [TeX:] $$\chi$$ and transmission power [TeX:] $$p_{u},$$ the time for uploading the parameter of user u is given by

##### (4)

[TeX:] $$t_{u}=\sum_{s \in S} x_{u s}\left(\frac{d_{u}}{R_{u s}(\chi, \mathcal{P})}\right), \forall u \in U.$$The uploading energy consumption, [TeX:] $$E_{u}[\mathrm{~J}],$$ of user u can be calculated as [TeX:] $$E_{u}=\frac{p_{u} t_{u}}{\xi_{u}}, \forall u \in U, \text { where } \xi_{u}$$ is the power amplifier efficiency of user u. We assume that [TeX:] $$\xi_{u}=1, \forall u \in U.$$ Therefore, the uploading energy consumption of user u can be simplified as

##### (5)

[TeX:] $$E_{u}=p_{u} t_{u}=p_{u} d_{u} \sum_{s \in S} \frac{x_{u s}}{R_{u s}(\chi, \mathcal{P})}, \forall u \in U.$$In the FEL system, user QoE is mainly reflected by task completion time and energy consumption. In the scenarios considered in this paper, compared with maximal task completion time and maximal energy restraint, the relative optimization of task completion time and energy consumption is represented by [TeX:] $$\frac{T-t_{u}}{T} \text { and } \frac{E-E_{u}}{E} \text {. }$$ Therefore, we can define user uploading utility as

##### (6)

[TeX:] $$J_{u}=\left(\beta_{u}^{t} \frac{T-t_{u}}{T}+\beta_{e}^{t} \frac{E-E_{u}}{E}\right) \sum_{s \in S} x_{u s}, \forall u \in U .$$where [TeX:] $$\beta_{t}^{u}, \beta_{e}^{u} \in[0,1] \text { and } \beta_{u}^{t}+\beta_{u}^{e}=1, \forall u \in U,$$ which represent the preferences of user u in task completion time and energy consumption, respectively. For example, for user u, a short battery life can increase [TeX:] $$\beta_{u}^{e}$$ and reduce [TeX:] $$\beta_{u}^{t},$$ that is, energy consumption is reduced by extending task completion time. In practical operations, cellphone users can set [TeX:] $$\beta_{u}^{e}$$ through different power-saving modes. For instance, under the super power-saving mode, [TeX:] $$\beta_{u}^{e}=1,$$ and under the maximum performance mode, [TeX:] $$\beta_{u}^{e}=0.$$ These users can also set the parameters based on the battery levels of their devices. E represents the maximal restraint for energy consumption and is determined by the actual condition of the edge device.

##### D. Problem Formulation

For a given uploading strategy [TeX:] $$\chi$$ and uplink power allocation [TeX:] $$\mathcal{P},$$ we define the system utility as follows as the weighted sum of the uploading efficiency of all users:

where [TeX:] $$J_{u}$$ is defined in (6), and [TeX:] $$\lambda_{u} \in(0,1]$$ defines the preference of the edge server in user [TeX:] $$u, \forall u \in U.$$ This parameter also determines the handling priority of different edge devices. For example, based on the obtained edge device information, the devices with enough battery levels and more updating data should be prioritized with high value of [TeX:] $$\lambda_{u}.$$ We now use a maximal system utility problem to represent power allocation as

The constraints in the above formulation can be explained as follows. (8a) implies that each sub-frequency band of each edge server serves one user at most. (8b) stipulates the system bandwidth. (8c) to (8e) specify the maximal transmission power, uploading energy consumption, and transmission time for each user, respectively. Given the limited computing capability and energy consumption of the involved edge devices, we aim to formulate a feasible low-complexity algorithm to the abovementioned problem.

## III. LOW-COMPLEXITY POWER ALLOCATION OPTIMIZATION ALGORITHM

By exploiting the structure of the objective function and constraints in (8), we design a low-complexity algorithm to optimize transmission power allocation. Given a feasible task uploading decision [TeX:] $$\chi$$ that meets restraint (8a), we use the [TeX:] $$J_{u}$$ expression in (6) to rewrite the target function in (8) as

##### (9)

[TeX:] $$J(\mathcal{P})=\sum_{s \in S} \sum_{u \in U_{s}} \lambda_{u}\left(\beta_{u}^{t}+\beta_{u}^{e}\right)-V(\mathcal{P}),$$where

##### (10)

[TeX:] $$V(\mathcal{P})=\sum_{s \in S} \sum_{u \in U_{s}} \lambda_{u}\left(\frac{\beta_{u}^{t} t_{u}}{T}+\frac{\beta_{u}^{e} E_{u}}{E}\right).$$The right side of (9) is constant for a specific uploading decision, and [TeX:] $$V(\mathcal{P})$$ can be seen as the total uploading cost of all users who have uploaded parameters. Therefore, we can redefine (8) as follows as a problem of minimizing total uploading cost:

Moreover, from (10), (4), and (5), we obtain

##### (12)

[TeX:] $$V(\chi, \mathcal{P})=\sum_{s \in S} \sum_{u \in U_{s}} \frac{\phi_{u}+\psi_{u} p_{u}}{\log _{2}\left(1+\gamma_{u s}\right)},$$where [TeX:] $$\phi_{u}=\frac{\lambda_{u} \beta_{u}^{t} d_{u}}{T_{u} W}, \psi_{u}=\frac{\lambda_{u} \beta_{u}^{e} d_{u}}{E W} .$$ Therefore, we define (12) as the target function of the UPA problem. Specifically, the UPA problem can be represented as

##### (13)

[TeX:] $$\min _{\mathcal{P}} \sum_{s \in S} \sum_{u \in U} \frac{\phi_{u}+\psi_{u} p_{u}}{\log _{2}\left(1+\gamma_{u s}\right)},$$

The in-cell interference [TeX:] $$I_{u s}^{j}=\sum_{w \in S} \sum_{k \in U_{w}} x_{k s}^{j} p_{k} h_{k s}^{j}$$ of the uplink SINR [TeX:] $$\gamma_{u s}^{j}$$ of user [TeX:] $$u \in U_{s}$$ depends on the transmission power of other users related to other BSs on the same sub-frequency band as the cell. Therefore, problem (13) remains a non-convex problem whose optimal solution cannot be easily found. To facilitate solution seeking, we need to find the approximate value of [TeX:] $$I_{u s}^{j}$$ to find the solution to [TeX:] $$\gamma_{u s}^{j}$$ and therefore divide problem (13) into several sub-problems. The optimal uplink power allocation [TeX:] $$P^{*}$$ obtained through solutionseeking remains the optimal value for the solution seeking of (13).

Assume that the uplink power allocation for each BS [TeX:] $$s \in S$$ is relatively independent, that is, users have no mutual collaboration or do not inform one another about their uplink transmission power between edge servers. In this case, the upper bound of [TeX:] $$I_{u s}^{j}$$ is

##### (14)

[TeX:] $$\tilde{I}_{u s}^{j} \triangleq \sum_{w \in S} \sum_{k \in U_{w}} x_{k s}^{j} p_{k} h_{k s}^{j}, \forall u \in U, s \in S, j \in \mathcal{N} .$$We regard [TeX:] $$\tilde{I}_{u s}^{j}$$ as the approximate value of [TeX:] $$I_{u s}^{j}.$$ Given that the FEL system only selects partial users for parameter uploading for each round of communication, the value of [TeX:] $$I_{u s}^{j}$$ is very small, that is, a small error of [TeX:] $$\tilde{I}_{u s}^{j}$$ will not lead to a huge difference in [TeX:] $$\gamma_{u s}^{j}.$$ By using [TeX:] $$\tilde{I}_{u s}^{j}$$ to replace [TeX:] $$I_{u s}^{j},$$ we can obtain the approximate uplink SINR value that user u uploads to edge server s through channel j as follows:

##### (15)

[TeX:] $$\tilde{\gamma}_{u s}^{j}=\frac{p_{u} h_{u s}^{j}}{\tilde{I}_{u s}^{j}+N_{0}}, \forall u \in U, s \in S, j \in \mathcal{N}.$$Let [TeX:] $$\vartheta_{u s}=\frac{\sum_{j \in \mathcal{N}} h_{u s}^{j}}{\tilde{I}_{u s}^{j}+N_{0}}, \Gamma_{s}\left(p_{u}\right)=\frac{\phi_{u}+\psi_{u} p_{u}}{\log _{2}\left(1+\vartheta_{u s} p_{u}\right)}$$ The tParget function in (13a) can be approximated as [TeX:] $$\sum_{s \in S} \sum_{u \in U} \Gamma_{s}\left(p_{u}\right).$$ The target function and restraint corresponding to the transmission power of each user are independent of each other. Therefore, the UPA problem described in (20) can be approximated as an optimization of the uploading power of each user [TeX:] $$u, u \in U, s \in S,$$ which can be expressed as

Problem (16) remains a non-convex problem because the [TeX:] $$p_{u}-$$ related second-order derivative [TeX:] $$\Gamma_{s}^{\prime \prime}\left(p_{u}\right)$$ of the target function does not meet the requirement of being constantly larger than 0. However, we can use the quasi-convex optimization technique to solve the problem (16) based on the following lemma:

a) Lemma 1: The definition field defined by [TeX:] $$\Gamma_{s}\left(p_{u}\right)$$ in (16a) is strictly quasi-convex.

Proof: See Appendix A.

Quasi-convex problems can be usually solved by dichotomy. Specifically, dichotomy finds the solution to a convex feasibility problem [25] in each round of iteration. However, the internal cutting plane method commonly used to solve convex feasibility problems requires [TeX:] $$\mathcal{O}\left(n^{2} / \epsilon^{2}\right)$$ iterations, where n is the number of problem dimensions. We also propose a method for further reducing the complexity of dichotomy.

Note that the quasi-convex function achieves the local optimum at the progressive decline point of the first-order derivative, and any local optimum of a strictly quasi-convex function is the global optimum [26]. Therefore, based on Lemma 1, we can determine that the optimal solution [TeX:] $$p_{u}^{*}$$ of problem (16) is at the restraint bound, that is, [TeX:] $$p_{u}^{*}=P_{u}$$ or [TeX:] $$\Gamma_{s}^{\prime}\left(p_{u}^{*}\right)=0.$$ When equation (17) is satisfied, we can verify that [TeX:] $$\Gamma_{s}^{\prime}\left(p_{u}^{*}\right)=0.$$

##### (17)

[TeX:] $$\Omega_{s}\left(p_{u}\right)=\psi_{u} \log _{2}\left(1+\vartheta_{u s} p_{u}\right)-\frac{\vartheta_{u s}\left(\phi_{u}+\psi_{u} p_{u}\right)}{\left(1+\vartheta_{u s} p_{u}\right) \ln 2}=0.$$We can conclude that [TeX:] $$\Omega_{s}^{\prime}\left(p_{u}\right)=\frac{\vartheta_{u s}^{2}\left(\phi_{u}+\psi_{u} p_{u}\right)}{\left(1+\vartheta_{u s} p_{u}\right)^{2} \ln 2}>0$$ and [TeX:] $$\Omega_{s}(0)=-\frac{\vartheta_{u s} \phi_{u}}{\ln 2} < 0,$$ which suggests that [TeX:] $$\Omega_{s}\left(p_{u}\right)$$ is a monotonically increasing function that is negative at the starting point [TeX:] $$p_{u}=0.$$ Therefore, we can design a low-complexity dichotomy to evaluate [TeX:] $$\Omega_{s}\left(p_{u}\right)$$ in each iteration instead of finding a solution to a convex feasibility problem so as to obtain the optimal solution [TeX:] $$p_{u}^{*}$$ as shown in Algorithm 1.

In Algorithm 1, if [TeX:] $$\Omega_{s}\left(p_{u}\right)>0,$$ then the algorithm will terminate after [TeX:] $$\left[\log _{2}\left(P_{u} / \xi\right)\right]$$ iterations. [TeX:] $$\xi$$ is the convergence threshold in line 14. The time complexity of this algorithm is [TeX:] $$O\left(\log _{2}(n)\right) . P^{*}=\left\{p_{u}^{*}, u \in U\right\}$$ represents the power allocation optimization solution for a given task uploading strategy.

## IV. LOW POWER CONSUMPTION BANDWIDTH ALLOCATION STRATEGY

In the previous section, based on user QoE and given task uploading solution [TeX:] $$\xi,$$ we obtain the power allocation optimization solution [TeX:] $$\mathcal{P}^{*}=\left\{p_{u}^{*}, u \in U\right\}.$$ To further reduce the parameter uploading energy consumption of the FEL system, we develop a low power consumption BA strategy based on the above mentioned solution.

We consider the BA problem for edge devices that satisfy the time restraint. The target of solving the BA problem is to minimize the total energy consumption, that is, [TeX:] $$\sum_{u \in U_{s}}\left(E_{u}^{l}+\right. \left.E_{u}\right).$$ Given that the energy consumption [TeX:] $$E_{u}^{l}$$ for the local

model training of all edge devices is equal, this problem can be transformed into minimizing uploading energy, that is,

where [TeX:] $$\delta_{u}$$ represents the bandwidth allocation rate and [TeX:] $$E_{u}^{u p}= \delta_{u} B p_{u} t_{u}=\sum_{u \in U} \frac{\delta_{u} B t_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{h_{u s}^{j}}\left(2^{\frac{d_{u}}{\delta_{u} B t_{u}}}-1\right).$$ Constraint (18a) means that the bandwidth sum allocated to edge devices uploading through the same frequency band does not exceed the total bandwidth, whereas constraint (18b) means that all devices involved in the uploading satisfy the time restraint. The optimal bandwidth allocation rate [TeX:] $$\delta_{u}$$ for edge devices can be obtained by seeking a solution to problem (18).

a) Lemma 2: The target function of problem (18) is a non-increasing function related to [TeX:] $$t_{u} \text { and } \delta_{u}, \forall u \in U.$$

The validity of Lemma 2 can be easily proven by finding a solution to the target function. Based on this lemma, the optimal solution to problem (18) can be obtained by maximizing the transmission time of each device within the time restraint, that is, [TeX:] $$t_{u}^{*}=T_{u}, u \in U.$$ and the values of [TeX:] $$t_{u}^{*}$$ and bandwidth allocation ratio [TeX:] $$\delta_{u}$$ are independent of each other. Therefore, the obtained optimal BA strategy is as follows.

b) Theorem 1: The optimal BA strategy can be expressed as

##### (19)

[TeX:] $$\delta_{u}^{*}=\frac{d_{u} l n 2}{B T_{u}\left[1+\mathcal{W}\left(\frac{h_{u s}^{j} v^{*}-B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{u s}^{\tilde{I}}+N_{0}\right) e}\right)\right]}, \forall u \in U, t_{u}^{*}=T_{u},$$where [TeX:] $$\mathcal{W}(\cdot)$$ is the Lambert W function, [TeX:] $$v^{*}$$ the Lagrange multiplier, and e is the Euler number.

Proof: See Appendix B.

To easily find a solution, we propose the following corollary:

c) Corollary 1: [TeX:] $$\delta_{u}^{*}$$ is a non-increasing function related to [TeX:] $$T_{u} \text { and } h_{u s}^{j} \text {. }$$

Proof: See Appendix C.

Corollary 1 shows that edge devices with a relatively weak computing capability, that is, relatively small [TeX:] $$T_{u},$$ will limit the synchronous updating of model parameters. Therefore, additional bandwidth should be allocated to these edge devices to minimize energy consumption. Specifically, those devices with weak computing capability can complete the model parameter uploading and reduce transmission power within the transmission restraint time by receiving additional bandwidth allocation.

Additional bandwidth should also be allocated to those devices with weak channels. The problem of poor channels can be solved by either improving transmission power or increasing bandwidth. To achieve the target of minimizing energy consumption, increasing bandwidth is the optimal solution.

## V. SIMULATION RESULTS

The performance of the proposed system with the uplink power allocation optimization and BA strategies is evaluated based on the simulation results. The multi-server edge cellular network considered in this paper is closer to the actual scenario. However, we can adjust S to let the optimization problem presented is suitable for single server and multiple server scenarios. Unless otherwise specified, the simulation parameters are set as follows. We consider a dense heterogeneous network environment using MEC, which consists of S = 7 intelligent edge units. Each unit includes one BS and 50 small base stations (SBS). The coverage radius of the BS is 500 meters, and that of SBS is 5 meters. It is assumed that each small base station serves only one person. The MEC server is deployed near BS, furthermore, the edge devices obey the uniform distribution and choose the nearest base station for communication. The duration of local model training, [TeX:] $$\left\{t_{u}^{l}\right\},$$ is evenly distributed within (0, 10] ms. The channel bandwidth is B=1MHz, the uplink gain, [TeX:] $$\left\{h_{u s}^{j}\right\},$$ of the sub-channel j between the edge device u and BS observes Rayleigh fading, the average path loss is [TeX:] $$10^{-5},$$ the Gaussian noise variance is [TeX:] $$N_{0}=10^{-8},$$ the maximal transmission power of edge device u is [TeX:] $$P_{u}=10 \mathrm{~W},$$ and the model size is set to [TeX:] $$d_{u}=10^{4}$$ bits to facilitate learning. For local model training, we assign MNIST dataset to each user for classification. We build a CNN model with 6 convolutional layers, [TeX:] $$2 * 2$$ max pooling layers, a fullyconnected layer and a softmax output layer.

a) The tradeoff performance of the proposed algorithm with time and energy: Given that only some edge devices can upload local model parameters simultaneously, each device is allocated with the same bandwidth. Based on user QoE, 50 different values are randomly set within the range of (0, 1) for the task completion time preference parameter [TeX:] $$\beta_{u}^{t}$$ and energy preference parameter [TeX:] $$\beta_{u}^{e}$$ of user u. Figure 3 describes the tradeoff between transmission time and transmission energy. The simulation result is shown as follows. When the uploading

time of model parameters is limited, a larger time preference parameter value [TeX:] $$\beta_{u}^{t}$$ can be set, and the user transmission power will increase along with [TeX:] $$\beta_{u}^{t}.$$ At this time, the energy preference parameter is relatively small, while the energy consumption is large. By contrast, when the energy consumed for model parameter uploading is limited, a larger energy consumption preference parameter [TeX:] $$\beta_{u}^{e}$$ can be set. User transmission power decreases along with an increasing [TeX:] $$\beta_{u}^{e}.$$ At this time, the task completion time preference parameter is relatively small, and the energy consumption is also small. The simulation result verifies the effectiveness of optimizing the uplink power and uploading time based on user QoE.

b) The optimal performance of the proposed algorithm: We then compare the practical performance of the proposed

optimal strategy with the baselines of average and random bandwidth allocations. In the average bandwidth allocation strategy, the uplink bandwidth is evenly distributed across all edge devices involved in uploading, with each device having the same uplink bandwidth. In the random bandwidth allocation strategy, the uplink bandwidth is distributed across all edge devices involved in uploading at random proportions. Based on Corollary 1, we verify the effectiveness of the proposed optimal strategy from maximal transmission time and channels.

i. The relationship curve between the uploading power [TeX:] $$P_{u}$$ and transmission time T of edge devices and the relationship curve between uploading energy consumption [TeX:] $$E_{u}$$ and transmission time T are shown in Figures 4 and 5, respectively. Under the three circumstances, both power and energy consumption decrease along with increasing restraint uploading time T. As proven by Lemma 2, a longer transmission time corresponds to a lower energy consumption. The proposed optimal strategy allocates additional bandwidth to those devices with a weak computing capability, thereby allowing them to complete the model parameter uploading and reduce the transmission power within the transmission restraint time. Figures 4 and 5 show that the proposed strategy reduces the transmission power by up to 21% and 23% compared with the average and random bandwidth allocations, respectively. Meanwhile, the transmission energy consumption increases by 1.4% and 1.9% compared with the baselines.

ii. Figures 6 and 7 compare the uploading power [TeX:] $$P_{u}$$ and uploading energy consumption [TeX:] $$E_{u}$$ in the three circumstances at each uplink gain [TeX:] $$h_{u s}^{j}.$$ At this time, the channel uplink gain [TeX:] $$h_{u s}^{j}$$ takes a random value within the range of [TeX:] $$\left(10^{-5}, 2 * 10^{-5}\right).$$ Under the three circumstances, the transmission power and energy consumption decrease along with the improvement of channels. One observation from Corollary 1 is that more bandwidths should be allocated to those edge devices with weak channels. Overcoming such problem allows us to boost the transmission power or increase the bandwidths. The latter solution is preferred for energy minimization. As proven in Lemma 2, better channels correspond to lower transmission energy consumption. The transmission power in the proposed optimal strategy is 19% and 31% lower than those in the

average and random bandwidth allocations, respectively, and the improvements in transmission energy consumption are 5% and 9.6% higher than the baselines.

In order to further verify the effectiveness of the algorithm, in addition to the random and average bandwidth allocation strategies, we refer to the BW scheme in [27] as a control experiment. Under the channel state that we can tolerate, the maximum transmission power allocated by BW strategy and bandwidth optimization proposed in this paper are 0:43W and 0:41W respectively. The transmitted power of the proposed scheme will reduce 5:24%. In terms of transmission energy consumption, the proposed scheme is optimized by 1:54%.

iii. The purpose of reflecting the generality of the algorithm, we also carry out tests on intelligent edge units with varying numbers (S = 10, S = 13). The addition of intelligent edge units will lead to more inter-cell interference during parameter transmission, and the terminals will increase transmission power to overcome this effect. As shown in figures 8 and 9, with the increase of the number of cells, the transmission power and energy consumption of equipment are significantly improved. More importantly, the experimental results show the universality of the proposed optimized bandwidth allocation. Even if the number of multiple edge cell is different, the proposed scheme can still effectively reduce the transmission power and energy consumption of the terminals according to the transmission delay limit and channel state.

## VI. CONCLUSIONS

A low power consumption UPA strategy for FEL in a multiunit intelligent edge network is proposed in this paper. To facilitate optimization, we divide UPA into the power allocation optimization strategy based on user QoE and the BA strategy. Solution seeking is performed by using the quasi-convex optimization and convex optimization techniques, and a lowcomplexity algorithm is proposed to solve the quasi-convex optimization problem. To satisfy user QoE, the proposed UPA strategy reduces the energy consumption of devices while adapting to the computing capability of channels and edge devices. Simulation results show that the proposed strategy outperforms the baselines in terms of transmission power and transmission energy consumption by 31% and 9.6%, respectively.

## APPENDIX A

##### A. Proof of Lemma 1

First, we proof that [TeX:] $$\Gamma_{s}\left(p_{u}\right)$$ is quadratic differentiable on [TeX:] $$\mathbb{R}.$$ Now we check the second-order condition of strictly quasi-convex function requiring p to satisfy [TeX:] $$\Gamma_{s}^{\prime}(p)=0$$ and [TeX:] $$\Gamma_{s}^{\prime \prime}(p)>0$$ [40]. The first and second derivatives of [TeX:] $$\Gamma_{s}\left(p_{u}\right)$$ can be calculated as

##### (20)

[TeX:] $$\Gamma_{s}^{\prime}(p)=\frac{\psi_{u} C_{u}\left(p_{u}\right)-\frac{\vartheta_{u s} D_{u}\left(p_{u}\right)}{A_{u}\left(p_{u}\right) \ln 2}}{C_{u}^{2}\left(p_{u}\right)},$$

##### (21)

[TeX:] $$\Gamma_{s}^{\prime \prime}(p)=\frac{\vartheta_{u s}\left[G_{u s}\left(p_{u}\right) C_{u s}\left(p_{u}\right)+2 \vartheta_{u s} D_{u s}\left(p_{u}\right) / \ln 2\right]}{A_{u s}^{2}\left(p_{u}\right) C_{u s}^{3}\left(p_{u}\right) \ln 2},$$where

##### (21d)

[TeX:] $$G_{u s}\left(p_{u}\right)=\vartheta_{u s} D_{u s}\left(p_{u}\right)-2 \psi_{u} A_{u s}\left(p_{u}\right).$$[TeX:] $$\bar{p}_{u} \in\left(0, P_{u}\right],$$ in order to prove [TeX:] $$\Gamma_{s}^{\prime}(p)=0,$$ we need,

##### (22)

[TeX:] $$\Omega_{s}\left(\bar{p}_{u}\right)=\psi_{u} \log _{2}\left(1+\vartheta_{u s} \bar{p}_{u}\right)-\frac{\vartheta_{u s}\left(\phi_{u}+\psi_{u} \bar{p}_{u}\right)}{\left(1+\vartheta_{u s} \bar{p}_{u}\right) \ln 2}=0,$$Substitute [TeX:] $$\bar{p}_{u}$$ into (21), we get,

##### (23)

[TeX:] $$\Gamma_{s}^{\prime \prime}\left(\bar{p}_{u}\right)=\frac{\vartheta_{u s}^{3} D_{u s}^{2}\left(\bar{p}_{u}\right)}{A_{u s}^{2}\left(\bar{p}_{u}\right) C_{u s}^{3}\left(\bar{p}_{u}\right) \psi_{u} \ln ^{2} 2},$$It can be easily verified that [TeX:] $$\forall \bar{p}_{u} \in\left(0, P_{u}\right], \vartheta_{u s} \text { and } D_{u s}^{2}\left(\bar{p}_{u}\right)$$ are always positive. Therefore, [TeX:] $$\Gamma_{s}^{\prime \prime}(p)>0, \text { and } \Gamma_{s}\left(p_{u}\right)$$ are strictly quasiconvex functions on [TeX:] $$\left(0, P_{u}\right].$$

##### B. Proof of Theorem 1

As mentioned above, [TeX:] $$t_{u}^{*}=T_{u}, u \in U.$$ Next, we prove the optimal bandwidth allocation policy. Substitute [TeX:] $$t_{u}=T_{u}$$ into (18), it can be rewritten as

##### (24)

[TeX:] $$\min _{\delta_{u}, t_{u}} \sum_{u \in U} \frac{\delta_{u} B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{h_{u s}^{j}}\left(2^{\frac{d_{u}}{\delta_{u} B T_{u}}}-1\right),$$

Since the above problem is a convex problem, by introducing Lagrange multipliers [TeX:] $$\mu^{*}=\left[\mu_{1}^{*}, \mu_{2}^{*}, \cdots, \mu_{U}^{*}\right]^{T} \in \mathbb{R}^{U}$$ for the inequality constraints [TeX:] $$\delta \succeq 0, \text { with } \delta=\left[\delta_{1}, \delta_{2}, \cdots, \delta_{U}\right]^{T},$$ and a multiplier [TeX:] $$v^{*} \in \mathbb{R}$$ for the equality constraint [TeX:] $$1^{T} \delta=1,$$ the KKT conditions can be written as follows

##### (25)

[TeX:] $$\begin{gathered} \frac{B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{h_{u s}^{j}}\left(2^{\frac{d_{u}}{\delta_{u}^{B} B_{u}}}-\frac{d_{u} \ln 2}{\delta_{u}^{*} B T_{u}} 2^{\frac{d_{u}}{\delta_{u}^{B T_{u}}}}-1\right)-\mu_{u}^{*}+v^{*} \\ =0, u \in U . \end{gathered}$$By solving the above equations, we can get

##### (26)

[TeX:] $$\delta_{u}^{*}=\frac{d_{u} \ln 2}{B T_{u}\left[1+\mathcal{W}\left(\frac{h_{u s}^{j} v^{*}-B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{u s}^{u}+N_{0}\right)}\right)\right]}$$where [TeX:] $$\mathcal{W}(\cdot)$$ is the Lambert W function, and the Lagrange multiplier value [TeX:] $$v^{*}$$ is obtained by solving [TeX:] $$\sum_{k=1}^{K}[1+ \left.\mathcal{W}\left(\frac{h_{u s}^{j} v^{*}-B T_{u}\left(I_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}\right)\right]=1.$$

##### C. Proof of Corollary 1

First, we prove that [TeX:] $$\delta_{u}^{*}$$ is a non-increasing function about [TeX:] $$T_{u}.$$ Denote [TeX:] $$x=\frac{h_{u s}^{j} v^{*}-B T u\left(I_{u s}^{j}+N_{0}\right)}{B T u\left(\tilde{I}_{u s}^{j}+N_{0}\right) e},$$ then we can obtain [TeX:] $$T_{u}=\frac{h_{u s}^{j} v^{*}}{\left(x+\frac{1}{e}\right) B\left(\tilde{I}_{u s}^{j}+N_{0}\right) e}$$ Substituting it to the expression for [TeX:] $$\gamma_{k}^{*},$$ one can we have,

##### (39)

[TeX:] $$\begin{aligned} \delta_{u}^{*} &=\frac{d_{u} \ln 2}{B T_{u}\left[1+\mathcal{W}\left(\frac{h_{u s}^{j} v^{*}-B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}\right)\right.} \\ &=\frac{\left(\tilde{I}_{u s}^{j}+N_{0}\right) e d_{u} \ln 2}{h_{u s}^{j} v^{*}} \frac{x+\frac{1}{e}}{1+\mathcal{W}(x)}. \end{aligned}$$Further, we denote,

##### (40)

[TeX:] $$y=\frac{x+\frac{1}{e}}{1+\mathcal{W}(x)}=\frac{\mathcal{W}(x) e^{\mathcal{W}(x)}+\frac{1}{e}}{1+\mathcal{W}(x)}.$$It can be easily proved that y is a non-decreasing function with respect to [TeX:] $$\mathcal{W}(x).$$ Since [TeX:] $$\mathcal{W}(x)$$ is a non-decreasing function of [TeX:] $$x, x\left(T_{k}\right)$$ is a non-increasing function of [TeX:] $$T_{k},$$ it follows that [TeX:] $$\delta_{u}^{*}$$ is non-increasing of [TeX:] $$T_{u}.$$

Next, we prove that [TeX:] $$\delta_{u}^{*}$$ is a non-increasing function with respect to [TeX:] $$h_{u s}^{j} \cdot h_{u s}^{j}=\frac{B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right) e}{v^{*}}\left(x+\frac{1}{e}\right)$$ can be obtained from [TeX:] $$x=\frac{h_{u s}^{j} v^{*}-B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right) e},$$ Substituting it into the expression for [TeX:] $$\delta_{u}^{*},$$ it follows that

##### (41)

[TeX:] $$\delta_{u}^{*}=\frac{d_{u} \ln 2}{B T_{u}\left[1+\mathcal{W}\left(\frac{h_{u s}^{j} v^{*}-B T_{u}\left(\tilde{I}_{u s}^{j}+N_{0}\right)}{B T_{u}\left(\tilde{I}_{1 u}^{3}+N_{0}\right)}\right)\right]}=\frac{d_{u} \ln 2}{B T_{u}} \frac{1}{1+\mathcal{W}(x)}.$$Further, we let

Obviously, z is non-increasing with respect to [TeX:] $$\mathcal{W}(x).$$ Because [TeX:] $$\mathcal{W}(x)$$ is non- decreasing about [TeX:] $$x, x\left(h_{u s}^{j}\right)$$ is nondecreasing about [TeX:] $$h_{u s}^{j},$$ we can conclude that [TeX:] $$we can conclude that$$ is a nonincreasing function with respect to [TeX:] $$h_{u s}^{j}.$$

## Biography

##### Tianyi Zhou

Tianyi Zhou received the B.E. degree in Information and Communication Engineering from from Beijing Information Science and Technology University, China, in 2019. She is currently pursuing the M.Phil. degree with the School of Information and Communication Engineering, Beijing Information Science and Technology University. Her research interests include wireless communication (5G), federated learning and intelligent resource management. 澳 XuehuaLi received the Ph.D. degree in telecommunications engineering from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008. She is currently a Professor and the Deputy Dean of the School of Information and Communication Engineering with Beijing Information Science and Technology University, Beijing. She is a Senior Member of the Beijing Internet of Things Institute. Her research interests are in the broad areas of communications and information theory, particularly the Internet of Things, and coding for multimedia communications system.

## Biography

##### Xuehua Li

Xuehua Li received the Ph.D. degree in telecommunications engineering from the Beijing University of Posts and Telecommunications, Beijing, China, in 2008. She is currently a Professor and the Deputy Dean of the School of Information and Communication Engineering with Beijing Information Science and Technology University, Beijing. She is a Senior Member of the Beijing Internet of Things Institute. Her research interests are in the broad areas of communications and information theory, particularly the Internet of Things, and coding for multimedia communications system.

## Biography

##### Chunyu Pan

Chunyu Pan received the Ph.D. degree with the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. Since 2019, she has been in Beijing Information Science and Technology University, where she is an Associate Professor in school of Information and Communication Engineering. Her main research interests include mobile communications and future networks, intelligent resource management, and UA V communications.

## Biography

##### Mingyu Zhou

Mingyu Zhou received the Ph.D. degree in Information and Communication Engineering from Beijing University of Posts and Telecommunications.He focus on innovation related to future wireless communication technologies. He has working experience with more than 10 years. He has released more than 20 papers and applied more than 100 patents.

## Biography

##### Yuanyuan Yao

Yuanyuan Yao received the Ph.D. degree in Information and Communication Engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2017. Since 2017, she has been with the School of Information and Communication Engineering, Beijing Information Science and Technology University, Beijing, China, as an Associate Professor. Her research interests include UA V communications, stochastic geometry and its applications in large-scale wireless networks, energy harvesting.

## References

- 1 M. Chen, U. Challita, W. Saad, C. Yin, M. Debbah, "Artificial neural networks-based machine learning for wireless networks: A tutorial,"
*IEEE Commun. Surveys Tuts.Jul*, vol. 21, no. 4, pp. 3039-3071, 2019.custom:[[[-]]] - 2 Y. Sun, M. Peng, Y. Zhou, Y. Huang, S. Mao, "Application of machine learning in wireless networks: Key techniques and open issues,"
*IEEE Commun. Surveys Tuts.Jun*, vol. 21, no. 4, pp. 3072-3108, 2019.custom:[[[-]]] - 3 Y. Liu, S. Bi, Z. Shi, L. Hanzo, "When machine learning meets big data: A wireless communication perspective,"
*arXiv preprint arXiv:1901.08329*, Jan, 2019.custom:[[[-]]] - 4 E. Li, L. Zeng, Z. Zhou, X. Chen, "Edge AI: On-demand accelerating deep neural network inference via edge computing,"
*IEEE Trans. Wireless Commun.*, vol. 19, no. 1, pp. 447-457, Oct, 2019.custom:[[[-]]] - 5 X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, M. Chen, "In-Edge AI: Intelligentizing mobile edge computing, caching and communication by federated Learning,"
*IEEE Netw.Jul*, vol. 33, no. 5, pp. 156-165, 2019.custom:[[[-]]] - 6 G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, K. Huang, "Towards an intelligent edge: Wireless communication meets machine learning,"
*IEEE Commun. Mag.*, vol. 58, no. 1, pp. 19-25, Jan, 2019.custom:[[[-]]] - 7 D. Wen, X. Li, Q. Zeng, J. Ren, K. Huang, "An overview of data-importance aware radio resource management foredge machine learning,"
*J. Commun. Inf. Netw.*, vol. 4, no. 4, pp. 1-14, Dec, 2019.custom:[[[-]]] - 8 K. Bonawitzetal., "Towards federated learning at scale: System design,"
*arXiv preprint arXiv:1902.01046*, Mar, 2019.custom:[[[-]]] - 9 L. WANG, W. WANG, B. LI, "CMFL: Mitigating Communication Overhead for Federated Learning," in
*Proc. IEEE ICDCS*, 2019;custom:[[[-]]] - 10 S. Ha, J. Zhang, O. Simeone, J. Kang, "Coded federated computing in wireless networks with straggling devices and imperfect CSI," in
*arXiv preprint arXiv:1901.05239*, Jan, 2019;custom:[[[-]]] - 11 O. Habachi, M. A. Adjif, J. P. Cances,
*Fast uplink grant for NOMA: A federated learning based approach*, arXiv preprint arXiv:1904.07975, Mar, 2019.custom:[[[-]]] - 12
*arXiv preprint arXiv:1604.00981, [Online].Available:*, https://arxiv.org/abs/1604.00981,2017 - 13 Cui L, X Su, Ming Z, et al., "Blockchain-assisted compression algorithm of federated learning for content caching in edge computing,"
*IEEE Internet Things J.Early Access*, Aug, 2020.custom:[[[-]]] - 14 Tra Huong Thi Le, Nguyen H. Tran, Yan Kyaw Tun, Zhu Han, Choong Seon Hong, "Auction based incentive design for efficient federated learning in cellular wireless networks," in
*Proc. IEEE WCNC*, 2020;custom:[[[-]]] - 15 Laizhong Cui, Xiaoxin Su, Yipeng Zhou, Yi Pan., "Slashing communication traffic in federated learning by transmitting clustered model updates,"
*IEEE J. Sel. Areas Commun.*, vol. 39, no. 8, pp. 2572-2589, Aug, 2021.custom:[[[-]]] - 16 Nguyen H. Tran, Wei Bao, Albert Zomaya, Minh N. H. Nguyen, Choong Seon Hong, "Federated learning over wireless networks: Optimization model design and analysis," in
*Proc. IEEE INFOCOM*, 2019;custom:[[[-]]] - 17 M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, S. Cui, "A joint learning and communications framework for federated learning over wireless networks,"
*IEEE Trans. Wireless Commun.*, vol. 20, no. 1, pp. 269-283, Jan, 2021.custom:[[[-]]] - 18 X. Lyu, H. Tian, P. Zhang, C. Sengul, "Multi-user joint task offloading and resources optimization in proximate clouds,"
*IEEE Trans. Veh. Technol.*, vol. 66, no. 4, pp. 3435-3447, Apr, 2017.custom:[[[-]]] - 19 L. Yang, J. Cao, H. Cheng, andY. Ji, "Multi-user computation partitioning for latency sensitive mobile cloud applications,"
*IEEE Trans. Comput.*, vol. 64, no. 8, pp. 2253-2266, Aug, 2015.custom:[[[-]]] - 20 S. Bi, J. Lyu, Z. Ding, R. Zhang, "Engineering radio maps for wireless resource management,"
*IEEE Wireless Commun.*, vol. 26, no. 2, pp. 133-141, Apr, 2019.custom:[[[-]]] - 21 W. Liu, J. Wei, Q. Meng, "Comparisions on KNN, SVM, BP and the CNN for handwritten digit recognition," in
*Proc. IEEE AEECA*, 2020;pp. 587-590. custom:[[[-]]] - 22 X. Mei, Q. Wang, X. Chu, "A survey and measurement study of GPU DVFS on energy conservation,"
*Digital Commun. Netw.*, vol. 3, no. 2, pp. 89-100, 2017.custom:[[[-]]] - 23 E. Dahlman, S. Parkvall, J. Skold, "4G:LTE/LTE-Advanced for Mobile Broadband,"
*New YorkNY , USA: Academic*, 2013.custom:[[[-]]] - 24 Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, J. G. Andrews, "User association for load balancingin heterogeneous cellular networks,"
*IEEE Trans. Wireless Commun.Jun*, vol. 12, no. 6, pp. 27062716-27062716, 2013.custom:[[[-]]] - 25 S. Boyd, L. Vandenberghe,
*Convex Optimization*, Cambridge U.K.: Cambridge Univ. Press, 2004.custom:[[[-]]] - 26 B. Bereanu, "Quasi-convexity, strictly quasi-convexity and pseudoconvexity of composite objective functions,"
*Revue Franc¸aise D’automatiqueInformatique, Recherche Operationnelle Mathematique*, vol. 6, no. 1, pp. 15-26, 1972.custom:[[[-]]] - 27 Chen J, H Xing, Lin X, et al., "Joint Cache Placement and Bandwidth Allocation for FDMA-based Mobile Edge Computing Systems," in
*Proc. IEEE ICC*, 2020;custom:[[[-]]]