## Nishat I Mowla , Nguyen H. Tran , Inshil Doh and Kijoon Chae## |

Algorithm | Case 1 | Case 2 |
---|---|---|

Success rate (%) | ||

Random policy | 50.909 | 81.818 |

Dijkstra’s algorithm (w/o knowledge) | 50.909 | 72.727 |

Weighted Dijkstra’s algorithm (with knowledge) | 100 | 100 |

Adaptive federated RL | 100 | 100 |

Average number of hop count | ||

Random policy | 5 | 9 |

Dijkstra’s algorithm (w/o knowledge) | 5 | 9 |

Weighted Dijkstra’s algorithm (with knowledge) | 6.273 | 9 |

Adaptive federated RL | 6.273 | 9 |

Average number of iterations to reach convergence | ||

Random policy | 2000 | 1800 |

Adaptive federated RL | 1800 | 1700 |

Average cumulative reward | ||

Random policy | 5820 | 4710 |

Adaptive federated RL | 5626 | 4657 |

weighted Dijkstra’s algorithm takes an alternate longer path for which their average hop count were 6.273 and 6.273, respectively. However, in terms of the average number of iterations required for convergence, the adaptive federated RL (1800) is lower than the random policy (2000). Moreover, the average cumulative reward of the adaptive federated RL (5626) is also close to the random policy (5820) as it adjusts between the exploitation and exploration opportunity of the model. Therefore, in the Case 1 scenario, it can be observed that there is a 3.33% decrease in the average cumulative reward while a convergence is reached 200 iterations earlier by our proposed adaptive federated RL model in comparison to the random policy.

In Case 2, as the number of jammer locations are increased from 0 to 3, the number of jammer hops for the Dijkstra’s algorithm (without knowledge) increased from 0 to 3 after which it remained constant at 3, as jammer locations were not in its selected route anymore. However, after the total number of jammer locations was increased from 3 to 6, the number of jammer hops covered by the random policy also increased from 0 to 3 as the jammer locations fell into its selected path. After six jammer locations, the number of jammer hops for random policy and Dijkstra’s algorithm (without knowledge) steadily remained at 3, as the jammer locations were placed in the other cells. In contrast, the number of jammer locations covered by the adaptive federated RL model and the weighted Dijkstra’s algorithm remained 0 for all the 10 jammer locations detected. Thus, the success rate of the random policy and the Dijkstra’s algorithm reduced to 81.82% and 72.73%. In contrast, the success rate of the adaptive federated RL and the weighted Dijkstra’s algorithm increased to 100% and 100% respectively. However, the average number of hop counts for the random policy, Dijkstra’s algorithm (without knowledge), and adaptive federated RL as the alternate routes taken by the adaptive federated RL also consisted of 9 hop counts. In fact, the weighted Dijkstra’s algorithm also consisted of 9 hop counts. As we increase the distance from the source to the destination in Case 2, the total number of hop counts for the proposed mechanism and the random policy become the same (i.e., 9 hop counts) for the increasing number of jammer locations. This is because there are more available

alternate routes and all of these alternate routes have the same number of hop counts due to the larger distance from the source to the destination. However, in terms of the average number of iterations required for convergence, the adaptive federated RL (1700) is lower than the random policy (1800). Moreover, the averageMoreover, the average cumulative reward of the adaptive federated RL (4657) is significantly close to the random policy (4710) as it exploits the adaptive epsilon-greedy policy. As a result, in Case 2, there is a 1.12% decrease in the average cumulative reward while a convergence is reached 100 iterations earlier by our proposed adaptive federated RL model in comparison to the random policy.

In this paper, we proposed an adaptive federated reinforcement learning-based jamming defense strategy in FANET consisting of UAV nodes. Then, an epsilon-greedy policy-based Q-learning spatial retreat jamming defense strategy was proposed on the basis of a federated learning-based jamming detection mechanism. We showed that the proposed adaptive federated reinforcement learning-based approach enabled performing better spatial retreat defense strategies. For doing so, the proposed mechanism leverages an efficient federated jamming detection mechanism to locate and retreat from the jammers in a newly explored environment. The supporting federated detection mechanism provided environment-specific knowledge about the jammer locations to the Q-learning module to converge its Q-learning score faster and adapt the exploitation property of the model. In the future, we will consider a global model for the Q-learning architecture to further federate the defense strategy.

Nishat I Mowla (S’18) received her B.S in Computer Science from Asian University for Women, Chittagong, Bangladesh, in 2013, M.S. degree in Computer Science and Engineering from Ewha Womans University, Seoul, Korea in 2016. She worked at Asian University for Women, Chittagong, Bangladesh as Senior Teaching Fellow. She is currently a Ph.D. student at Ewha Womans University, Seoul, Korea. Her research interests include next generation network security, IoT network security, machine intelligence, and network traffic analysis. Ms. Mowla was awarded the best paper award at the Qualcomm paper awards 2017, Ewha Womans University, Seoul, Korea paper competition. She is a student member of IEEE.

Nguyen H. Tran (S’10-M’11-SM’18) received his B.S. from Hochiminh City University of Technology and Ph.D. from Kyung Hee University, in Electrical and Computer Engineering, in 2005 and 2011, respectively. Since 2018, he has been with the School of Computer Science, The University of Sydney, where he is currently Senior Lecturer. He was Assistant Professor with Department of Computer Science and Engineering, Kyung Hee University, Korea from 2012 to 2017. His research interest is to apply the analytical techniques of optimization, game theory, and stochastic modeling to cutting-edge applications such as cloud and mobile edge computing, data centers, heterogeneous wireless networks, and big data for networks. He received the best KHU thesis award in engineering in 2011 and best paper award at IEEE ICC 2016. He has been Editor of IEEE Trans. Green Communications and Networking since 2016, and served as Editor of the 2017 Newsletter of Technical Committee on Cognitive Networks on Internet of Things.

Inshil Doh received her B.S. and M.S. in Computer Science and Engineering at Ewha Womans University, Korea, in 1993 and 1995, respectively. She received her Ph.D. degree in Computer Science and Engineering from Ewha Womans University in 2007. From 1995-1998, Prof. Doh worked at Samsung SDS of Korea. She was a Research Professor at Ewha Womans University and at Sungkyunkwan University. She is currently an Associate Professor of the Department of Cyber Security at Ewha Womans University, Seoul, Korea. Her research interests include wired/wireless network security, sensor network security, and IoT network security.

Kijoon Chae received his B.S. in Mathematics from Yonsei University in 1982, an M.S. in Computer Science from Syracuse University in 1984. He received his Ph.D. in Electrical and Computer Engineering from North Carolina State University in 1990. He is currently a Professor at the Department of Computer Science and Engineering at Ewha Womans University, Seoul, Korea. His research interests include blockchain, security of FANET, sensor network, smart grid, CDN, SDN and IoT, network protocol design, and performance evaluation.

- 1 I. Bekmezci, O. K. Sahingoz, S. Temel, "Flying ad-hoc networks (FANETs): A survey,"
*Ad Hoc Netw.*, vol. 11, no. 3, pp. 1254-1270, May, 2013.custom:[[[-]]] - 2 A. Guillen-Perez, M. Cano, "Flying ad hoc Networks: A new domain for network communications,"
*Sensorsp. 3571*, vol. 18, no. 10, Oct, 2018.custom:[[[-]]] - 3 I. Bekmezci, E. Senturk, T. Turker, "Security issues in flying ad-hoc networks (FANETS),"
*J. Aeronautics Space Technologies*, vol. 9, no. 2, pp. 13-21, July, 2016.custom:[[[-]]] - 4 O. Sahingoz, "Networking models in flying ad-hoc networks (FANETs): Concepts and challenges,"
*J.IntelligentRoboticSystems*, vol. 74, no. 1-2, pp. 513-527, Oct, 2014.custom:[[[-]]] - 5 A. Chriki, H. Touati, H. Snoussi, F. Kamoun, "FANET: Communication, mobility models and security issues,"
*Comput. Netw.p. 106877*, vol. 163, Nov, 2019.custom:[[[-]]] - 6 W. Xu Wenyuan, K. Ma, W. Trappe, Y. Zhang, "Jamming sensor networks: Attack and defense strategies,"
*IEEE Netw.*, vol. 20, no. 3, pp. 41-46, June, 2006.doi:[[[10.1109/MNET.2006.1637931]]] - 7 L. Xiao, D. Jiang, D. Xu, H. Zhu, Y. Zhang, H. Vincent Poor, "Twodimensional antijamming mobile communication based on reinforcement learning,"
*IEEE Trans.Veh.Technol.*, vol. 67, no. 10, pp. 9499-9512, July, 2018.custom:[[[-]]] - 8 L. Xiao et al., "UA V relay in V ANETs against smart jamming with reinforcement learning,"
*IEEE Trans. Veh. Technol.*, vol. 67, no. 5, pp. 4087-4097, Jan, 2018.custom:[[[-]]] - 9 H. Sedjelmaci, S. M. Senouci, N. Ansari, "A hierarchical detection and response system to enhance security against lethal cyber-attacks in UA V networks,"
*IEEE Trans.SystemsMan,andCybernetics: Systems*, vol. 48, no. 9, pp. 1594-1606, Mar, 2018.custom:[[[-]]] - 10 H. Sedjelmaci, S. M. Senouci, N. Ansari, "Intrusion detection and ejection framework against lethal attacks in UA V-aided networks: A bayesian game-theoretic methodology,"
*IEEE Trans. Intelligent TransportationSystems*, vol. 18, no. 5, pp. 1143-1153, May, 2017.custom:[[[-]]] - 11 N. Mowla, I. Doh, K. Chae, "Federated learning-based cognitive detection of jamming attack in flying ad-hoc network,"
*IEEE Access*, vol. 8, no. 1, pp. 4338-4350, Dec, 2019.custom:[[[-]]] - 12 H. McMahan, E. Moore, D. Ramage, S. Hampson, "Communicationefficient learning of deep networks from decentralized data,"
*arXiv preprintarXiv:1602.05629*, 2016.custom:[[[-]]] - 13 J. Konecny et al., "Federated learning: Strategies for improving communication efficiency,"
*arXivpreprint arXiv:1610.05492*, 2016.custom:[[[-]]] - 14 K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon et al., "Towards federated learning at scale: System design,"
*arXivpreprint arXiv:1902.01046*, 2019.custom:[[[-]]] - 15 T. Nishio, R. Yonetani, "Client selection for federated learning with heterogeneous resources in mobile edge,"
*in Proc. IEEE ICC*, pp. 1-7, 2019.custom:[[[-]]] - 16 N. Mowla, I. Doh, K. Chae, "On-device AI-based cognitive detection of bio-modality spoofing in medical cyber physical system,"
*IEEE Access*, vol. 7, no. 1, pp. 2126-2137, Dec, 2018.custom:[[[-]]] - 17 H. H. Zhuo, W. Feng, Q. Xu, Q. Yang, Y. Lin, "Federated reinforcement learning,"
*arXivpreprintarXiv, 1901.08277*, 2019.custom:[[[-]]] - 18 B. Liu, L. Wang, M. Liu, C. Xu, "Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems,"
*arXivpreprint arXiv:1901.06455*, 2019.custom:[[[-]]] - 19 S. Zhang, A. E. Choromanska, Y. LeCun, "Deep learning with elastic averaging SGD,"
*in Proc.NIPS*, pp. 685-693, 2015.custom:[[[-]]] - 20 Y. Zhang, J. Duchi, M. I. Jordan, M. J. Wainwright, "Informationtheoretic lower bounds for distributed statistical estimation with communication constraints,"
*in Proc.NIPS*, pp. 2328-2336, 2013.custom:[[[-]]] - 21 N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, C. Hong, "Federated Learning over wireless networks: Optimization model design and analysis,"
*in Proc.IEEE INFOCOM*, pp. 1387-1395, 2019.custom:[[[-]]] - 22 P. Cevik, I. Kocaman, A. S. Akgul, B. Akca, "The small and silent force multiplier: a swarm UA V-electronic attack,"
*J. Intelligent Robotic Systems*, vol. 70, no. 1-4, pp. 595-608, Apr, 2013.custom:[[[-]]] - 23 Y. Shi et al., "Adversarial deep learning for cognitive radio security: Jamming attack and defense strategies,"
*in Proc. IEEE ICC Workshops*, pp. 1-6, 2018.custom:[[[-]]] - 24 J. Dean et al., "Large scale distributed deep networks,"
*in Proc. NIPS*, pp. 1223-1231, 2012.custom:[[[-]]] - 25 S. Vadlamani, B. Eksioglu, H. Medal, A. Nandi, "Jamming attacks on wireless networks: A taxonomic survey,"
*International J. Production Economics*, vol. 172, pp. 76-94, Feb, 2016.custom:[[[-]]] - 26 G. Noubir, "On connectivity in ad hoc networks under jamming using directional antennas and mobility,"
*in Proc. IFIP WWICSpringer, Berlin, Heidelberg, pp.186-200*, pp. 2019 186-200, 2019.custom:[[[-]]] - 27 J. Seongah, O. Simeone, A. Haimovich, J. Kang, "Beamforming design for joint localization and data transmission in distributed antenna system,"
*IEEE Trans.Veh.Technol.*, vol. 64, no. 1, pp. 62-76, Jan, 2014.doi:[[[10.1109/TVT.2014.2317831]]] - 28 A. Mukherjee, A. Swindlehurst, "Robust beamforming for security in MIMO wiretap channels with imperfect CSI,"
*IEEE Trans. Signal Processing*, vol. 59, no. 1, pp. 351-361, Jan, 2011.doi:[[[10.1109/TSP.2010.2078810]]] - 29 F. Zhu, "Joint information- and jamming-beamforming for physical layer security with full duplex base station,"
*IEEE Trans. Signal Processing*, vol. 62, no. 24, pp. 6391-6401, Dec, 2014.doi:[[[10.1109/TSP.2014.2364786]]] - 30 S. Bhunia, V. Behzadan, P. Alexandre Regis, S. Sengupta, "Adaptive beam nulling in multi hop ad hoc networks against a jammer in motion,"
*ComputerNetw.*, vol. 109, pp. 50-66, Nov, 2016.custom:[[[-]]] - 31 S. Bhunia, V. Behzadan, P. Alexandre Regis, S. Sengupta, "Distributed adaptive beam nulling to survive against jamming in 3d uav mesh networks,"
*ComputerNetw.*, vol. 137, pp. 83-97, June, 2018.doi:[[[10.1016/j.comnet.2018.03.011]]] - 32 M. Mozaffari, W. Saad, M. Bennis, Y. Nam, M. Debbah, "A tutorial on UA Vs for wireless networks: Applications, challenges, and open problems,"
*IEEE Commun. Surveys Tutorialsthirdquarter*, vol. 21, no. 3, pp. 2334-2360, 2019.custom:[[[-]]] - 33 S. Bhattacharya, Tamer Basar, "Game-theoretic analysis of an aerial jamming attack on a UA V communication network,"
*in Proc. ACC*, pp. 818-823, 2010.custom:[[[-]]] - 34 J. Glascher, N. Daw, P. Dayan, J. P. O’Doherty, "States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning,"
*Neuron*, vol. 66, no. 4, pp. 585-595, May, 2010.custom:[[[-]]] - 35 A. Geron, "Hands-on machine learning with Scikit-Learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems,"
*O’Reilly MediaInc*, 2017.custom:[[[-]]] - 36 Q. Wei, F. L. Lewis, Q. Sun, P. Yan, R. Song, "Discrete-time deterministicQ-learning: A novel convergence analysis,"
*IEEE Trans.cybernetics*, vol. 47, no. 5, pp. 1224-1237, May, 2016.custom:[[[-]]] - 37 B. Luo, D. Liu, T. Huang, D. Wang, "Model-free optimal tracking control via critic-only Q-learning,"
*IEEE Trans.neuralnetworksandlearning systems*, vol. 27, no. 10, pp. 2134-2144, Oct, 2016.doi:[[[10.1109/TNNLS.2016.2585520]]] - 38 H. Modares, F. L. Lewis, M. B. Naghibi-Sistani, "Adaptive optimal control of unknown constrained-input systems using policy iteration and neural network,"
*IEEE Trans. Neural Netw. learning Systems*, vol. 24, no. 10, pp. 1513-1525, Oct, 2013.custom:[[[-]]] - 39
*O. Punal, C. Pereira, A. Aguiar, and J. Gross, CRAWDAD dataset uportorwthaachen/vanetjamming2014 (v. 2014-05-12), downloaded from*, https://crawdad.org/uportorwthaachen/vanetjamming2014/20140512. - 40 T. D. Kulkarni, A. Saeedi, S. Gautam, S. J. Gershman, "Deep successor reinforcement learning,"
*arXivpreprintarXiv:1606.02396*, 2016.custom:[[[-]]] - 41 S. Broumi, A. Bakal, M. Talea, F. Smarandache, L. Vladareanu, "Applying Dijkstra algorithm for solving neutrosophic shortest path problem,"
*in Proc.ICAMechSIEEE*, pp. 412-416, 2016.custom:[[[-]]] - 42 M. Asadpour, D. Giustiniano, K. A. Hummel, S. Heimlicher., "Characterizing 802.11 n aerial communication,"
*in Proc.ACMMobiHocworkshop*, pp. 7-12, 2013.custom:[[[-]]] - 43 S. Rosati, K. Kruzelecki, L. Traynard, B. R. Mobile., "Speed-aware routing for UA V ad-hoc networks,"
*in Proc. IEEE Globecom Workshops*, pp. 1367-1373, 2013.custom:[[[-]]] - 44 N. Goddemeier, S. Rohde, C. Wietfeld, "Experimental validation of RSS driven UA V mobility behaviors in IEEE 802.11s networks,"
*in Proc. IEEE GlobecomWorkshops*, pp. 1550-1555, 2012.custom:[[[-]]] - 45 D. Broyles, A. Jabbar, J.P.G. Sterbenz, "Design and Analyis of a 3-D Gauss-Markov mobility model for highly-dynamic air bourne network,"
*in Proc.ITC*, pp. 25-28, 2010.custom:[[[-]]] - 46 J. P. Rohrer et al., "AeroRP Performance in Highly Dynamic Airborne networks using 3D Gauss Markov Mobility model,"
*in Proc. MILCOM*, pp. 834-841, 2011.custom:[[[-]]] - 47 G. F Riley, T. R. Henderso, ., "The ns-3 network simulator,"
*Modeling and tools for network simulationBerlin, Heidelberg*, pp. 15-34. Springer, 2010.custom:[[[-]]] - 48
*ns-3, "". (Online) Available at*, https://www.nsnam.org - 49 A. Tahir, J. Boling, M. H. Haghbayan. H. T. Toivonen, J. Plosila, "Swarms of Unmanned Aerial Vehicles — A Survey,"
*J. Industrial Information Integration*, vol. 16, Dec, 2019.custom:[[[-]]]