I. INTRODUCTION
AS the evolution of wireless communication progresses toward 6G, there is a pressing need to explore innovative technologies that can overcome the challenges of hyper-high data rates, hyper-low latency, and extreme connectivity [1], [2]. Integrated sensing and communication (ISAC) has gained attention as a potential solution to fulfill these demands, especially in the millimeter and terahertz waves [3]–[6]. A key advantage of ISAC is its ability to leverage shared spectral resources for both radar systems and communication networks. Traditionally, these two functions have operated independently, resulting in inefficient use of the spectrum and limited coordination between systems. In contrast, radar and communication functionalities can be seamlessly integrated with ISAC, facilitating more efficient spectrum utilization and improved overall system performance. Moreover, ISAC holds great potential for enabling advanced applications, including industrial automation, autonomous vehicles, and augmented reality. By combining sensing and communication capabilities, ISAC can also provide real-time environmental awareness, allowing devices to make intelligent decisions and adapt rapidly to changing conditions.
ISAC systems leverage the reuse of information-bearing signals for radar sensing, enhancing spectrum efficiency. However, this raises concerns about information leakage, especially to untrusted sensing targets acting as potential eavesdroppers. To mitigate these risks, physical layer security techniques for ISAC systems have been investigated [7]–[9]. The authors in [7] focus on optimizing a multiple-input multipleoutput (MIMO) communication-radar system involving a legitimate user and an eavesdropping target by maximizing the secrecy rate and target return SINR, and minimizing transmit power. To achieve these goals, they optimize the transmit beampattern and employ a Taylor series approximation to address the non-convex nature of the optimization problems, ensuring effective and secure communication under power constraints. The authors in [8] explore a dual-functional MIMO radar-communication system that optimizes transmit beamforming to enhance secrecy rate and target detection signal-to-interference-plus-noise ratio (SINR) while minimizing power usage. They address challenges like target location uncertainty and imperfect channel state information (CSI) by using robust optimization techniques, including artificial noise, to impair eavesdroppers while maintaining secure communication for legitimate users. In [9], the authors propose using constructive and destructive interference to maximize the SINR for radar while ensuring communication security by designing transmit waveforms and receive beamformers that mitigate potential eavesdropping by targets. They introduce a fractional programming algorithm to maximize radar SINR, with considerations for practical scenarios like uncertain target locations.
Reconfigurable intelligent surfaces (RISs) are increasingly being recognized as a critical enabling technology for future 6G because they can manage the propagation environment of wireless signals by controlling the phase shifts of reconfigurable and passive reflecting elements [10]–[12]. Moreover, researchers have initiated studies on deploying RIS in ISAC systems, leveraging their capabilities in standalone communication or sensing systems. Here, waveform design and passive beamforming are jointly optimized to mitigate multi-user interference, enhancing the quality of service (QoS) in communication systems. This optimization is conducted while subject to power constraints and the QoS for radar sensing, specifically in relation to the similarity of the beampattern [13]. This approach provides a better balance between radar and communication functions. The goal of joint beamforming design for the base station (BS) and reflection coefficients is to maximize the communication sum-rate while maintaining radar sensing performance [14]. A productive alternating algorithm was proposed to iteratively solve the non-convex optimization problem and demonstrates through simulations that deploying RIS in ISAC systems significantly enhances performance.
Previous work on RIS-assisted ISAC has primarily focused on terrestrial networks, which face significant limitations in supporting aerial service scenarios, such as drones or unmanned aerial vehicles (UAVs). Additionally, the fixed positioning of RISs in terrestrial settings limits their adaptability and coverage, making them less effective in dynamic or obstructed environments. In contrast, the QoS in aerial scenarios can be significantly enhanced by deploying a RISassisted ISAC in a UAV network. In this study, we propose a deep reinforcement learning (DRL) solution for RIS-assisted secure ISAC systems to maximize the secrecy rate. The key contributions of this study can be summarized as follows:
We integrate a UAV into a RIS-assisted secure ISAC system, where the eavesdropper (the sensing target) could intercept communication signals transmitted to legitimate aerial users, such as drones and flying cars. By mounting the RIS on a UAV, the system can establish and maintain line-ofsight more effectively than terrestrial systems, resulting in more reliable aerial services, such as urban air mobility. Additionally, the coverage area of the ISAC system can be significantly expanded, which is particularly beneficial in hard-to-reach areas where terrestrial infrastructure is either unavailable or impractical. Accordingly, the UAV-RIS-ISAC system can be scaled more efficiently to cover large or dispersed regions.
We develop a DRL framework, specifically using the deep deterministic policy gradient (DDPG) algorithm [19], to solve the non-convex optimization problem in RIS-assisted secure ISAC systems. The proposed DRL-based beamforming solution jointly optimizes the beamforming matrix of the ISAC BS and the beamforming matrix of the RIS (i.e., phase shifts of the RIS). This approach effectively maximizes the secrecy rate while ensuring that radar detection requirements are met, addressing the complexities of the non-convex problem by enabling dynamic, real-time adjustments to the beamforming strategy in response to environmental changes.
We validate the proposed DRL-based joint beamforming solution through realistic 3D ray-tracing simulations. The results demonstrate that the use of RIS considerably enhances the secrecy rate of the ISAC system by effectively reducing interference from eavesdroppers and increasing signal gain for legitimate users. The simulations also highlight the algorithm’s adaptability, showing that it performs well under various conditions, such as different transmit power levels and radar SNR requirements.
II. SYSTEM MODEL
Herein, we consider the RIS-assisted secure ISAC system illustrated in Fig. 1, where an ISAC BS simultaneously serves K legitimate users and an eavesdropper is treated as a sensing target under the assistance of an RIS mounted on a UAV. The ISAC BS has M antennas for both transmit and receive, and the RIS mounted on the UAV has N reflecting elements.
The signal transmitted by the ISAC BS is represented as
where [TeX:] $$\mathbf{s}_c \in \mathbb{C}^{K \times 1} \text { and } \mathbf{s}_r \in \mathbb{C}^{M \times 1}$$ denote the communication symbol and radar signal, respectively. Terms [TeX:] $$\mathbf{W}_c \in \mathbb{C}^{M \times K} \text { and } \mathbf{W}_r \in \mathbb{C}^{M \times M}$$ denote the communication and radar beamforming matrices, respectively. The radar and communication signals are assumed to be independent to prevent mutual interference between radar and communications (i.e., [TeX:] $$\left.\mathbb{E}\left\{\mathbf{s}_c, \mathbf{s}_r^H\right\}=0\right)$$ For brevity, [TeX:] $$\mathbf{W} \triangleq\left[\mathbf{W}_c \mathbf{W}_r\right] \in \mathbb{C}^{M \times(K+M)}$$ denotes the overall beamforming matrix, and [TeX:] $$\mathbf{s} \triangleq\left[\mathbf{s}_c^T \mathbf{s}_r^T\right]^T \in \mathbb{C}^{(K+M) \times 1}$$ represents the transmit symbol vector.
A. Radar Model
The echo signal, which is reflected by the target, is subsequently reflected by the RIS to the ISAC BS. Therefore, the collected echo signal at the ISAC BS from the target can be represented as
where [TeX:] $$\mathbf{h}_{b, t} \in \mathbb{C}^{M \times 1}, \mathbf{H}_{b, r} \in \mathbb{C}^{N \times M} \text {, and } \mathbf{h}_{r, t} \in \mathbb{C}^{N \times 1}$$ denote the channels between the ISAC BS and the target, between the ISAC BS and the RIS, and between the RIS and the target, respectively. The reflection coefficient matrix of the RIS is denoted by [TeX:] $$\Phi \triangleq \operatorname{diag}(\phi) \in \mathbb{C}^{N \times N},$$ where [TeX:] $$\phi \triangleq\left[\phi_1, \cdots, \phi_N\right]^T$$ with [TeX:] $$\left|\phi_n\right|=1, \forall n \text { and } \phi_n=e^{j \varphi_n},$$ where [TeX:] $$\varphi_n \in[0,2 \pi)$$ is the RIS phase shift of the nth reflecting element. The [TeX:] $$\mathbf{n}_r \sim \mathcal{\mathcal { C N }}\left(0, \sigma_t^2 \mathbf{I}_M\right)$$ and [TeX:] $$\mathbf{e} \sim \mathcal{C N}\left(0, \sigma_e^2 \mathbf{I}_N\right)$$ terms denote the additive white Gaussian noise (AWGN) at the ISAC BS receiver and RIS, respectively.
We disregard the static noise under the assumption that the received radar echo signal significantly exceeds the thermal noise reflection at the radar receiver. Consequently, the received echo signal in (2) is approximated as
To maintain conciseness, we define the equivalent channel matrix as
The received SNR from the target is given by
B. Communication Model
The signal received by the kth user can be represented as
where [TeX:] $$\mathbf{h}_{b, k} \in \mathbb{C}^{M \times 1} \text { and } \mathbf{h}_{r, k} \in \mathbb{C}^{N \times 1}$$ denote the channels between the ISCA BS and the kth user and between the RIS and the kth user, respectively. In addition, [TeX:] $$n_k \sim \mathcal{C N}\left(0, \sigma_k^2\right)$$ denotes the AWNG at the kth user. Hence, the received SINR at the kth user is given as
where [TeX:] $$\mathbf{w}_j$$ represents the jth column of the beamforming matrix [TeX:] $$\mathbf{W}$$, [TeX:] $$j=1, \cdots, K+M .$$ Based on (7), the achievable rate for the legitimate users is given by
In terms of the kth user, the received SINR at the eavesdropper is given as
Based on (9), the achievable rate for the eavesdropper is given by
Thus, the secrecy rate is represented as [20]
where [TeX:] $$[x]^{+}=\max (0, x).$$
C. Optimization Problem
In this study, our objective was to jointly optimize the beamforming matrix of ISAC BS [TeX:] $$\mathbf{W}$$ and RIS [TeX:] $$\Phi$$ to maximize the secrecy rate while satisfying the SNR requirement for radar detection. The problem is formulated as
where [TeX:] $$P_t$$ is the total transmit power at ISAC BS and [TeX:] $$\gamma_t$$ is the required SNR for detecting the target.
III. PROPOSED SOLVER VIA DRL
The optimization problem in (12) is challenging due to its non-convex nature, which makes traditional algorithms ineffective in solving it. In addition, the scale of the variables involved in this system is too extensive to be managed using a traditional exhaustive search-based approach. Therefore, we adopt the DRL algorithm, which is capable of effectively managing dynamic environments by adapting in real-time and continuously optimizing decisions—an essential capability for systems with changing conditions, such as UAV networks. In particular, we employ the DDPG algorithm to address the optimization problem. This approach leverages the widely recognized actor-critic network, as depicted in Fig. 2.
The proposed DDPG-based joint beamforming.
A. Actor-critic Network
Within the actor-critic network, the DDPG algorithm simultaneously learns the Q-function and the policy. This dual learning approach enables the system to optimize decisionmaking in a continuous action space. The DDPG algorithm employs four neural networks: two for the actor and two for the critic, with each consisting of an online network and a target network. Next, we define these four networks of the DDPG algorithm.
The online actor network is used to iteratively update parameters [TeX:] $$\theta^\mu.$$ The selection of action [TeX:] $$a_t$$ according to state [TeX:] $$s_t$$ is expressed as
where [TeX:] $$\mathcal{N}_t$$ is the Ornstein-Uhlenbeck noise for action exploration. In addition, the online actor network interacts with the environment to observe the reward [TeX:] $$r_t$$ and next state [TeX:] $$s_{t+1},$$ which are then stored in the replay buffer. A mini-batch of [TeX:] $$N_b$$ transitions [TeX:] $$\left(s_t, a_t, r_t, s_{t+1}\right)$$ is then randomly sampled from the replay buffer. The policy of an actor is updated by the sampled policy gradient, which is given as
where J is the objective function in the form of a discounted cumulative reward. For the next learning step, [TeX:] $$\theta^\mu$$ is updated with the learning rate [TeX:] $$l_{r_\mu}.$$
The online critic network is employed to interactively update parameters [TeX:] $$\theta^Q .$$ The target value [TeX:] $$y_i$$ and the loss function L are expressed as follows:
For the next learning step, [TeX:] $$\theta^Q$$ is updated with learning rate [TeX:] $$l_{r_Q}.$$
Finally, a soft update target parameter regulates the learning frequency of the target networks, updating the target critic and actor networks. The processes of updating this parameter are expressed as follows:
B. DRL-based Beamforming
In this section, we propose the DRL-based beamforming to address the problem using DDPG. The agent, which in this context is the ISAC BS, interacts with the environment to maximize cumulative rewards and subsequently determine the optimal actions policies. The environment encompasses all system-related information, including the ISAC BS, communication users, radar target, and RIS.
State: The ISAC BS acquires the state of the environment by observing the environment related to time-varying channel conditions. The state is designed to include the channel information [TeX:] $$\mathbf{H}=\left\{\mathbf{h}_{b, t}, \mathbf{H}_{b, r}, \mathbf{h}_{r, t}, \mathbf{h}_{b, k}, \mathbf{h}_{r, k}\right\} .$$ In addition, the state includes the beamforming matrix of ISAC BS [TeX:] $$\mathbf{W}$$ and RIS [TeX:] $$\Phi$$. Thus, the state for time step t is
Because the neural network does not support complex number inputs, the complex matrices for the beamforming matrix and channel are represented by their real and imaginary parts. Therefore, the dimension of the state space is given by [TeX:] $$D_s=2\left(M^2+M+2 N+N M+N K+2 M K\right) .$$.
Action: In our problem, the beamforming matrices of the ISAC BS ([TeX:] $$\mathbf{W}$$) and the RIS ([TeX:] $$\Phi$$) is jointly optimized to maximize the secrecy rate of the RIS-assisted ISAC for UAV networks. Accordingly, the system’s action space encompasses these matrices. Thus, the action for time step t is
Similar to the state, the beamforming matrices of the ISAC BS and RIS are represented by their real and imaginary parts. The dimension of the action space is given by [TeX:] $$D_a=2(M \times(K+M)+N) .$$
Reward: The optimization problem aims to maximize the secrecy rate while simultaneously satisfying the radar detection SNR requirement. Therefore, we define the reward function in terms of the secrecy rate and SNR requirement for radar detection. For time step t, the instantaneous reward [TeX:] $$r_t$$ is given by
For the DDPG-based beamforming, a deep neural network is adopted with L layers: an input layer, L−2 hidden layers, and an output layer. The network structure is displayed in Table I, where both the actor and critic networks have four layers. In the actor network, the input and output layers have the neurons that correspond to the state and action dimensions, respectively. Similarly, in the critic network, the input layer has neurons equal to the state dimension, while the output layer has a single neuron representing the Q value.
IV. SIMULATION RESULTS
A. Simulation Setup
When configuring the simulation, we utilized the publicly accessible DeepMIMO dataset [21], which is derived from the ray-tracing outdoor scenario designated as “O1 Drone Scenario.” The parameters relevant to this scenario are detailed in Table II. The positions of the ISAC BS and the UAVmounted RIS were fixed at coordinates [237.504, 580, 6] m and [237.504, 650, 80] m, respectively. In contrast, the legitimate user and eavesdropper were positioned randomly within a specified x-y user grid (UG), where the x- and y-values ranged from 0 to 439.83 m and 400 to 499.63 m, respectively. The grid points were spaced 0.81 m apart along both the x- and yaxes. The heights of the communication user and the sensing target were set to 40 m (UG1) and 42.4 m (UG2), respectively, as illustrated in Fig. 3. For the training of the actor and critic networks, we employed the neural network structure described in Section III-B. The other hyper-parameters are summarized in Table III.
Simulation setup based on outdoor drone scenario.
DEEPMIMO PARAMETERS FOR SIMULATION DATASET.
HYPER-PARAMETERS FOR MODEL TRAINING.
B. Performance Evaluation
First, we evaluated the convergence performance of the proposed DRL-based beamforming solution. Fig. 4 shows the convergence curve, depicting both the instant and average rewards as the training steps. The average reward is calculated as follows:
where T is the maximum training step. The instant and average rewards are depicted with dotted and solid lines, respectively, across two different scenarios. In the scenario with RIS, both the ISAC BS and RIS were beamformed using the proposed DRL-based solution to maximize the secrecy rate while ensuring the radar requirements were met. In the scenario without RIS, no RIS was included in the simulation, and only the ISAC BS was beamformed using the proposed DRL-based solution. The results demonstrate that the scenario with RIS achieved improved performance compared to the scenario without RIS, as the RIS controlled signal propagation to introduce additional links and minimized the risk of eavesdropping. This indicates that the RIS-assisted ISAC system enhances secure communication performance. In addition, the solution converged faster in the scenario without RIS compared to the scenario with RIS. This difference in convergence speed is attributed to the larger dynamic range of instant rewards when beamforming both the ISAC BS and the RIS phase shifts, leading to greater fluctuations.
Convergence curve for different RIS scenarios.
Fig. 5 shows the convergence curve according to the radar detection SNR requirement, highlighting the trade-off between communication and radar detection performance. As the SNR requirement for radar detection increased, the secrecy rate achieved by the secure ISAC system decreased. This is because more resources were allocated to ensure the QoS for sensing. In addition, convergence was faster at higher radar SNR thresholds [TeX:] $$\left(\gamma_t=5,15,20 \mathrm{~dB}\right)$$ compared to a lower radar SNR threshold [TeX:] $$\left(\gamma_t=0 \mathrm{~dB}\right).$$ This is because a lower radar SNR threshold (i.e., higher communication QoS) resulted in a larger dynamic range of instant rewards, leading to greater fluctuations.
Convergence curve for different radar SNR thresholds.
Figs. 6 and 7 display the convergence curve and secrecy rate according to transmit power levels, respectively. Both the reward and secrecy rate performance improved with increasing transmit power. Moreover, convergence was faster at lower transmit power levels [TeX:] $$\left(P_t=0,10 \mathrm{dBm}\right)$$ compared to higher levels [TeX:] $$\left(P_t=20,30 \mathrm{dBm}\right)$$ as shown in Fig. 6. This is because higher transmit power resulted in a larger dynamic range of instant rewards, leading to greater fluctuations. Fig. 7 presents a comparison of the secrecy rate between the DRLbased beamforming solution with RIS (DRL-based w/ RIS) and without RIS (DRL-based w/o RIS). The proposed DRL-based w/ RIS solution achieved a higher secrecy rate than the DRL- based w/o RIS solution. This is because the DRL-based w/ RIS solution adaptively increased the communication user’s signal gain while reducing the interference for the eavesdropper, thereby mitigating the effects of eavesdropping and improving the secrecy rate. Specifically, the DRL-based w/ RIS solution achieved a secrecy rate approximately 55.49% higher than the DRL-based w/o RIS solution at [TeX:] $$\left(P_t=10 \mathrm{dBm}\right)$$
Convergence curve for different transmit power levels.
Secrecy rate of proposed solution.
V. CONCLUSION
In this study, we explored the integration of UAV-mounted RIS in secure ISAC systems, where the ISAC BS simultaneously served legitimate users and an eavesdropper as a sensing target. We proposed a DRL-based beamforming solution to jointly optimize the beamforming at the ISAC BS and the RIS, aiming to maximize the secrecy rate while meeting radar SNR requirements. Through realistic simulations using 3D ray-tracing, we demonstrated that the proposed DRL solution could achieve significant improvements by effectively increasing the signal gain for the legitimate user while simultaneously reducing interference for the eavesdropper. This led to a reduction in the effects of eavesdropping and an enhancement of the secrecy rate. Additionally, a key advantage of mounting a RIS on UAVs is their mobility. The dynamic positioning capabilities of UAVs offer further opportunities for optimizing RIS performance. Future work could explore leveraging the mobility of UAV-mounted RISs for more adaptive and responsive deployment, thereby enhancing the effectiveness of ISAC systems across various environments.