Privacy-Preserving Reinforcement Learning for Robust Medical Intelligent Agents Under Adversarial Attacks
Keywords:
reinforcement learning, privacy preservation, adversarial robustness, medical intelligent agents, differential privacy, secure multi-party computation, healthcare artificial intelligence governanceAbstract
The integration of reinforcement learning into medical intelligent agents promises substantial advances in clinical decision support, personalized treatment planning, and dynamic resource allocation. However, the deployment of such agents in real healthcare environments introduces critical vulnerabilities, particularly concerning patient data privacy and susceptibility to adversarial manipulation of learned policies. This paper develops a comprehensive system-level framework for privacy-preserving reinforcement learning that simultaneously achieves robustness against adversarial attacks while maintaining operational efficacy in medical contexts. We examine the structural trade-offs inherent in combining differential privacy mechanisms with adversarial training objectives, analyzing how these approaches interact with the sequential decision-making nature of reinforcement learning and the stringent regulatory requirements of healthcare. Our discussion extends beyond algorithmic design to encompass governance architectures, infrastructure considerations for federated clinical deployment, computational sustainability of privacy-preserving training pipelines, and fairness implications when privacy protections may disproportionately affect underrepresented patient populations. Through comparative analysis with existing privacy-preserving machine learning paradigms and adversarial defense strategies, we identify key failure modes, including policy distortion under strong privacy budgets and covert adversarial perturbations that exploit privacy noise. We propose an integrated architecture that layers differential privacy, certified adversarial robustness, and policy verification within a secure multi-party computation framework tailored for medical settings. Policy implications are drawn regarding the need for adaptive regulatory standards that accommodate both privacy guarantees and functional robustness. This work positions privacy-preserving robust reinforcement learning as a critical infrastructure component for trustworthy medical artificial intelligence systems.
References
1. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308-318.
2. Pinto, L., Davidson, J., Sukthankar, R., & Gupta, A. (2017). Robust adversarial reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, 2817-2826.
3. Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289.
4. Tonneau, M., Alami, H., & Garnier, N. (2021). Adversarial attacks on reinforcement learning agents for treatment learning in sepsis. Journal of Biomedical Informatics, 117, 103749.
5. Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. Proceedings of the 2017 IEEE Symposium on Security and Privacy, 3-18.
6. Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C., & Faisal, A. A. (2018). The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine, 24(11), 1716-1720.
7. Balle, B., Barthe, G., & Gabillard, M. (2018). Privacy amplification by subsampling in the Rényi differential privacy framework. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 1348-1362.
8. Gleave, A., Gleave, M., Dennis, M., Russell, S., & Levine, S. (2020). Adversarial policies: Attacking deep reinforcement learning. Proceedings of the 2020 International Conference on Learning Representations.
9. Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., & Jana, S. (2019). Certified robustness to adversarial examples with differential privacy. Proceedings of the 2019 IEEE Symposium on Security and Privacy, 656-672.
10. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Roth, E. (2017). Practical secure aggregation for privacy-preserving machine learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1175-1191.
11. Wang, B., Gong, N. Z., & Li, B. (2020). Attacking black-box classifiers with attribute inference. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 919-934.
12. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.
13. Bagdasaryan, E., Poursaeed, O., & Shmatikov, V. (2019). Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems, 32, 15479-15488.
14. Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315-3323.
15. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), 1-210.
16. Abadi, M., Erlingsson, Ú., Goodfellow, I., McMahan, H. B., Papernot, N., & Shmatikov, V. (2018). Differential privacy for deep learning: A survey. Journal of Privacy and Confidentiality, 8(1), 1-29.
17. Jia, R., & Liang, P. (2017). Adversarial examples for evaluating reading comprehension systems. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2021-2031.
18. Acquisti, A., Brandimarte, L., & Loewenstein, G. (2015). Privacy and human behavior in the age of information. Science, 347(6221), 509-514.
19. Costan, V., & Devadas, S. (2016). Intel SGX explained. IACR Cryptology ePrint Archive, 2016, 86.
20. Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Proceedings of the 2016 International Conference on Learning Representations.
21. Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2), 841-887.
22. Yoon, J., Jordon, J., & van der Schaar, M. (2018). GAIN: Missing data imputation using generative adversarial nets. Proceedings of the 35th International Conference on Machine Learning, 5689-5698.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Computer Science and Engineering Transactions

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



