Collision avoidance in air traffic using recurrent reinforcement learning accounting for pilot reaction delay
Аuthors
Northwestern Polytechnical University, 710072, 127, West Youyi Road, Beilin District, Xi'an Shaanxi, P.R.China
e-mail: liuzuocheng@mail.nwpu.edu.cn
Abstract
Air collision avoidance systems are critical for flight safety, especially in growing air traffic. While traditional systems such as Traffic Collision Avoidance System (TCAS) offer solutions based on Markov Decision Processes (MDPs), these models do not account for important real-world factors such as pilot reaction delays. In this work, we formulate the air collision avoidance problem as a partially observed Markov Decision Process (POMDP) to solve problems caused by pilot response delays. To solve the resulting POMDP problem, we use the Long Short-Term Memory Soft Actor-Critic discrete (LSTM SAC-d) algorithm, which extends the Soft Actor-Critic discrete (SAC-d) framework by including time dependencies. Our model takes into account a pilot reaction delay of 3 seconds, which reflects real limitations. We compare the performance of LSTM SAC-d with Markov's SAC-d and demonstrate that LSTM SAC-d significantly outperforms the latter in collision avoidance efficiency and overall solution stability. Experimental results show that the SAC-d LSTM significantly improves system performance by better accounting for pilot response delays and optimizing recommendations in real time.
Keywords:
reinforcement learning, pilot response influence, air collision avoidance, aircraft collision model, dynamic aircraft modelReferences
- Holland J E, Kochenderfer M J, Olson W A. Optimizing the next generation collision avoidance system for safe, suitable, and acceptable operational performance[J]. Air Traffic Control Quarterly, 2013, 21(3): 275-297.
- Londner E H, Moss R J. Bayesian network model of pilot response to collision avoidance system resolution advisories[J]. Journal of Air Transportation, 2018, 26(4): 171-182.
- Panoutsakopoulos C, Yuksek B, Inalhan G, et al. Towards safe deep reinforcement learning for autonomous airborne collision avoidance systems[C]//AIAA SCITECH 2022 Forum. 2022: 2102.
- Li S, Egorov M, Kochenderfer M. Optimizing collision avoidance in dense airspace using deep reinforcement learning[J]. arXiv preprint arXiv:1912.10146, 2019.
- Rizk H, Chaibet A, Kribèche A. Model-based control and model-free control techniques for autonomous vehicles: A technical survey[J]. Applied Sciences, 2023, 13(11): 6700.
- Lindqvist B, Mansouri S S, Agha-mohammadi A, et al. Nonlinear MPC for collision avoidance and control of UAVs with dynamic obstacles[J]. IEEE robotics and automation letters, 2020, 5(4): 6001-6008.
- Kochenderfer M J, Chryssanthacopoulos J P. Robust airborne collision avoidance through dynamic programming[J]. Massachusetts Institute of Technology, Lincoln Laboratory, Project Report ATC-371, 2011, 130.
- Brechtel S, Gindele T, Dillmann R. Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs[C]//17th international IEEE conference on intelligent transportation systems (ITSC). IEEE, 2014: 392-399.
- Kochenderfer M J, Chryssanthacopoulos J P. Robust airborne collision avoidance through dynamic programming[J]. Massachusetts Institute of Technology, Lincoln Laboratory, Project Report ATC-371, 2011, 130.
- Ni T, Eysenbach B, Salakhutdinov R. Recurrent model-free rl can be a strong baseline for many pomdps[J]. arXiv preprint arXiv:2110.05038, 2021.
Download

